Submission: Co-op (SAM6D, Coarse, RGBD)/HB/Coarse only, magsac++, det thres=0.4 / Intel(R) Core(TM) i9-14900K, RTX4090

Download submission
Submission name Coarse only, magsac++, det thres=0.4 / Intel(R) Core(TM) i9-14900K, RTX4090
Submission time (UTC) Sept. 16, 2024, 2:48 a.m.
User sp9103
Task Model-based 6D detection of unseen objects
Dataset HB
Description
Evaluation scores
AP:0.744
AP_MSPD:0.749
AP_MSSD:0.738
average_time_per_image:0.928

Method: Co-op (SAM6D, Coarse, RGBD)

User sp9103
Publication
Implementation
Training image modalities RGB-D
Test image modalities RGB-D
Description

Submitted to: BOP Challenge 2024

Training data: MegaPose-GSO and MegaPose-ShapeNetCore

Onboarding data: 42 rendered templates

Used 3D models: Default, CAD

Notes:

Our coarse estimator is based on local feature matching between the query image and multiple pre-rendered templates. We model the query and rendered images as aggregation of multiple patches. The coarse network finds the matchings between patch centers of input crop and rendered templates. From the 42 templates, the best template is selected and pose hypothesis is generated by RANSAC-PnP for RGB and MAGSAC++ [A] for RGB-D case.

We use CroCo [B] pretraining for our coarse estimator. Note that the inputs to our neural networks are the rgb images only. The 2D detector used is specified in parentheses in the title, and it uses the FastSAM object proposals.

For the 6D detection task, we perform pose estimation on detections with scores higher than 0.4.

[A] Barath et al.: MAGSAC++, a fast, reliable and accurate robust estimator, CVPR 2020
[B] Weinzaepfel et al.: CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow, ICCV 2023

Computer specifications Intel(R) Core(TM) i9-14900K, RTX4090