|Publication||Labbé et al, CosyPose: Consistent multi-view multi-object 6D pose estimation, ECCV 2020|
|Training image modalities||RGB|
|Test image modalities||RGB|
The method is the single-view (1 view) object pose estimation introduced in Labbé et al, CosyPose: Consistent multi-view multi-object 6D pose estimation, ECCV 2020.
For each dataset, we train three networks: a MaskRCNN detector (only 2D detections are used at test time), a model for coarse pose estimation, and a model for iterative refinement. The refinement network is ran for 4 iterations at test time.
Only provided PBR synthetic images are used on each dataset. We add data augmentation to the synthetic images as described in the paper. Pose networks are trained from scratch on all objects. MaskRCNN has a resnet50 FPN backbone pre-trained on COCO.
The timing includes detection and pose estimation on all detections. We do not use the targets file to filter out the detections that are not evaluated.
|Computer specifications||CPU: 20-core Intel Xeon 6164 @ 3.2 GHz, GPU: Nvidia V100|