|Publication||Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation, ICCV 2019|
|Training image modalities||RGB|
|Test image modalities||RGB|
Results are obtained after training the entire pipeline with fixed parameters for all datasets. Because of the modifications of parameters such as a fewer number of training iterations and the usage of real or synthetic images for training, the results can be different from what the paper reports.
As described in the paper, a detection method is used to provide 2D bounding boxes of all objects in a dataset. Differ from the paper, we used Mask-RCNN for 2D detection, which also provides a segmentation mask of each detected object. This mask is used to calculate the score by comparing the valid mask that is predicted by the pix2pose network. As we performed in the paper, a pix2pose network is trained for an object in the dataset. (e.g., for t-less, 1 mask R-CNN for 2D detection + 30 pix2pose networks for 6D pose estimation). No further refinement is applied.
For training, if real training images are provided, we used them. For datasets without real training images, we rendered images from uniformly sampled viewpoints that are defined in the t-less dataset (we referred to pose data in "train_render_reconst" of the t-less dataset).
|Computer specifications||i7-9700K / GTX 1070-Ti / RAM 32G|