|Training image modalities||RGB-D|
|Test image modalities||RGB-D|
Submitted to: BOP Challenge 2023
Training data: MegaPose-GSO and MegaPose-ShapeNetCore
Onboarding data: No
Used 3D models: Default, CAD
In this submission, GDRNPPDet_PBRReal [A] detections are used as the input to our pose estimation method. Our pose estimation method uses the coarse-to-fine strategy following the MegaPose [B] structure. A single model is used for all datasets. Our model was not trained on each BOP dataset, but only on the MegaPose dataset.
We use the multi-hypotheses method proposed in the MegaPose [B]. The details are as follows: For each detection, we extract top-5 hypotheses from our coarse network, and each hypothesis is refined using the refinement network. The refined hypotheses are scored using the coarse network, and the best one is considered the output.
Our coarse network is based on the MegaPose [B] coarse network. The main differences from the original MegaPose paper are as follows:
Our refinement network is based on the Shape-Constraint Recurrent Flow framework [C]. It estimates the flow from the rendered image to the input. The main differences from the original SCFlow paper are as follows:
Note that the inputs to our neural networks are the rgb images only, and the depth images in the training dataset are used to train the visibility mask. In this submission, each coarse hypothesis is refined 5 times. In the RGBD case, RANSAC-Kabsch is used for the depth refinement after estimating flow at every refinement step.
[A] Liu et al.: https://github.com/shanice-l/gdrnpp_bop2022
List of contributors: Sungphill Moon (firstname.lastname@example.org), Hyeontae Son (email@example.com)
If you have any questions, feel free to contact us.
|Computer specifications||GPU A100; CPU Intel Xeon Gold6326@2.90G|