Submission: FreeZeV2 (SAM6D)/HB/SAR threshold = -1

Download submission
Submission name SAR threshold = -1
Submission time (UTC) Sept. 20, 2024, 9:50 a.m.
User andreacaraffa
Task Model-based 6D detection of unseen objects
Dataset HB
Description
Evaluation scores
AP:0.782
AP_MSPD:0.783
AP_MSSD:0.781
average_time_per_image:56.969

Method: FreeZeV2 (SAM6D)

User andreacaraffa
Publication Caraffa et al: FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models, ECCV 2024
Implementation
Training image modalities None
Test image modalities RGB-D
Description

Submitted to: BOP Challenge 2024

Training data: We do not train models on 6D pose estimation data. We use two frozen models pre-trained on web-scale 2D images and 3D point clouds, respectively.

Onboarding data: We render 162 templates for each object. We compute visual features from the rendered images, we back-project them into 3D and aggregate them. We compute geometric features directly from the 3D models and estimate geometric symmetries using the Chamfer distance.

Used 3D models: CAD models for T-LESS, default models for the other datasets.

Notes: We do not use task-specific training. We leverage two pre-trained geometric and vision foundation models, i.e. GeDi [A] and DINOv2 [B] to generate 3D discriminative point-level descriptors. We estimate objects' 6D pose via 3D registration based on RANSAC followed by ICP refinement.

We use segmentation masks provided by SAM6D [C].

[A] Poiesi et al.: Learning general and distinctive 3D local deep descriptors for point cloud registration, IEEE PAMI 2023
[B] Oquab et al.: DINOv2: Learning robust visual features without supervision, arXiv 2023
[C] Lin et al.: SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation, CVPR 2024

Authors: Andrea Caraffa, Davide Boscaini, Amir Hamza and Fabio Poiesi

Computer specifications GPU A40; CPU Intel(R) Xeon(R) Silver 4316 @ 2.30GHz