Submission: Pix2Pose-BOP19-ICCV19/HB/Basic

Download submission
Submission name Basic
Submission time (UTC) Oct. 14, 2019, 9:43 p.m.
User kirumang
Task 6D localization of seen objects
Dataset HB
Training model type Default
Training image type Synthetic (custom)
Description Poses of rendered images: the same poses used in the rendered training images of the T-Less dataset Images for training Mask R-CNN: 200,000 images, 5 epochs, crop and paste rendered images to random background images from coco2017 Training of Pix2Pose: 33,000 iterations for each object
Evaluation scores
AR:0.200
AR_MSPD:0.311
AR_MSSD:0.153
AR_VSD:0.136
average_time_per_image:0.645

Method: Pix2Pose-BOP19-ICCV19

User kirumang
Publication Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation, ICCV 2019
Implementation https://github.com/kirumang/Pix2Pose
Training image modalities RGB
Test image modalities RGB
Description

Results are obtained after training the entire pipeline with fixed parameters for all datasets. Because of the modifications of parameters such as a fewer number of training iterations and the usage of real or synthetic images for training, the results can be different from what the paper reports.

As described in the paper, a detection method is used to provide 2D bounding boxes of all objects in a dataset. Differ from the paper, we used Mask-RCNN for 2D detection, which also provides a segmentation mask of each detected object. This mask is used to calculate the score by comparing the valid mask that is predicted by the pix2pose network. As we performed in the paper, a pix2pose network is trained for an object in the dataset. (e.g., for t-less, 1 mask R-CNN for 2D detection + 30 pix2pose networks for 6D pose estimation). No further refinement is applied.

For training, if real training images are provided, we used them. For datasets without real training images, we rendered images from uniformly sampled viewpoints that are defined in the t-less dataset (we referred to pose data in "train_render_reconst" of the t-less dataset).

Computer specifications i7-9700K / GTX 1070-Ti / RAM 32G