Submission name | |||||||||
---|---|---|---|---|---|---|---|---|---|
Submission time (UTC) | Sept. 17, 2024, 8:17 p.m. | ||||||||
User | andreacaraffa | ||||||||
Task | Model-based 6D detection of unseen objects | ||||||||
Dataset | HB | ||||||||
Description | |||||||||
Evaluation scores |
|
User | andreacaraffa |
---|---|
Publication | Caraffa et al: FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models, ECCV 2024 |
Implementation | |
Training image modalities | None |
Test image modalities | RGB-D |
Description | Submitted to: BOP Challenge 2024 Training data: We do not train models on 6D pose estimation data. We use two frozen models pre-trained on web-scale 2D images and 3D point clouds, respectively. Onboarding data: We render 162 templates for each object. We compute visual features from the rendered images, we back-project them into 3D and aggregate them. We compute geometric features directly from the 3D models and estimate geometric symmetries using the Chamfer distance. Used 3D models: CAD models for T-LESS, default models for the other datasets. Notes: We do not use task-specific training. We leverage two pre-trained geometric and vision foundation models, i.e. GeDi [A] and DINOv2 [B] to generate 3D discriminative point-level descriptors. We estimate objects' 6D pose via 3D registration based on RANSAC followed by ICP refinement. We use segmentation masks provided by SAM6D [C]. [A] Poiesi et al.: Learning general and distinctive 3D local deep descriptors for point cloud registration, IEEE PAMI 2023 Authors: Andrea Caraffa, Davide Boscaini, Amir Hamza and Fabio Poiesi |
Computer specifications | GPU A40; CPU Intel(R) Xeon(R) Silver 4316 @ 2.30GHz |