BOP: Benchmark for 6D Object Pose Estimation

Submission: Pix2Pose-BOP19-ICCV19/HB/Basic

Download submission

Submission name

Basic

Submission time (UTC)

Oct. 14, 2019, 9:43 p.m.

User

kirumang

Task

6D localization of seen objects

Dataset

Training model type

Default

Training image type

Synthetic (custom)

Description

Poses of rendered images: the same poses used in the rendered training images of the T-Less dataset Images for training Mask R-CNN: 200,000 images, 5 epochs, crop and paste rendered images to random background images from coco2017 Training of Pix2Pose: 33,000 iterations for each object

Evaluation scores

AR:	0.200
AR_MSPD:	0.311
AR_MSSD:	0.153
AR_VSD:	0.136
average_time_per_image:	0.645

Method: Pix2Pose-BOP19-ICCV19

User	kirumang
Publication	Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation, ICCV 2019
Implementation	https://github.com/kirumang/Pix2Pose
Training image modalities	RGB
Test image modalities	RGB
Description	Results are obtained after training the entire pipeline with fixed parameters for all datasets. Because of the modifications of parameters such as a fewer number of training iterations and the usage of real or synthetic images for training, the results can be different from what the paper reports. As described in the paper, a detection method is used to provide 2D bounding boxes of all objects in a dataset. Differ from the paper, we used Mask-RCNN for 2D detection, which also provides a segmentation mask of each detected object. This mask is used to calculate the score by comparing the valid mask that is predicted by the pix2pose network. As we performed in the paper, a pix2pose network is trained for an object in the dataset. (e.g., for t-less, 1 mask R-CNN for 2D detection + 30 pix2pose networks for 6D pose estimation). No further refinement is applied. For training, if real training images are provided, we used them. For datasets without real training images, we rendered images from uniformly sampled viewpoints that are defined in the t-less dataset (we referred to pose data in "train_render_reconst" of the t-less dataset).
Computer specifications	i7-9700K / GTX 1070-Ti / RAM 32G