Submission: GenFlow-MultiHypo/TUD-L

Download submission
Submission name
Submission time (UTC) Sept. 25, 2023, 11:39 p.m.
User sp9103
Task Model-based 6D localization of unseen objects
Dataset TUD-L
Description
Evaluation scores
AR:0.849
AR_MSPD:0.883
AR_MSSD:0.883
AR_VSD:0.782
average_time_per_image:5.180

Method: GenFlow-MultiHypo

User sp9103
Publication https://arxiv.org/abs/2403.11510
Implementation -
Training image modalities RGB-D
Test image modalities RGB-D
Description

Submitted to: BOP Challenge 2023

Training data: MegaPose-GSO and MegaPose-ShapeNetCore

Onboarding data: No

Used 3D models: Default, CAD

Notes:

In this submission, CNOS_fastSAM [A] detections are used as the input to our pose estimation method. Our pose estimation method uses the coarse-to-fine strategy following the MegaPose [B] structure. A single model is used for all datasets.

We use the multi-hypotheses method proposed in the MegaPose [B]. The details are as follows: For each detection, we extract top-5 hypotheses from our coarse network, and each hypothesis is refined using the refinement network. The refined hypotheses are scored using the coarse network, and the best one is considered the output.

Our coarse network is based on the MegaPose [B] coarse network. The main differences from the original MegaPose paper are as follows:

  • To bypass the computation of rendering tons of images, we render fewer images and run the coarse network. Then we create a GMM with the top-k hypotheses and run the coarse network to the sampled hypotheses from the GMM.

Our refinement network is based on the Shape-Constraint Recurrent Flow framework [C]. It estimates the flow from the rendered image to the input. The main differences from the original SCFlow paper are as follows:

  • Our network estimates the visibility mask from the rendered image to the input. To train the visibility, we adopted the certainty estimation by classifying depth-consistency proposed in DKM [D].
  • We replace the pose regression network with the differentiable PnP solver to generalize the refiner well.

Note that the inputs to our neural networks are the rgb images only, and the depth images in the training dataset are used to train the visibility mask. In this submission, each coarse hypothesis is refined 5 times. In the RGBD case, RANSAC-Kabsch is used for the depth refinement after estimating flow at every refinement step.

[A] Nguyen et al.: CNOS: A Strong Baseline for CAD-based Novel Object Segmentation, arXiv 2023
[B] Labbé et al.: MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare, CoRL 2022
[C] Hai et al.: Shape-Constraint Recurrent Flow for 6D Object Pose Estimation, CVPR 2023
[D] Edstedt et al.: DKM: Dense Kernelized Feature Matching for Geometry Estimation, CVPR 2023

List of contributors: Sungphill Moon (sungphill.moon@naverlabs.com), Hyeontae Son (son.ht@naverlabs.com)

If you have any questions, feel free to contact us.

Computer specifications GPU V100; CPU Intel Xeon Gold6248@2.5G