BOP: Benchmark for 6D Object Pose Estimation

Submission: FreeZeV2.2/IC-BIN

Download submission

Submission name

Submission time (UTC)

May 31, 2025, 11:06 p.m.

User

andreacaraffa

Task

Model-based 6D localization of unseen objects

Dataset

IC-BIN

Description

Evaluation scores

AR:	0.711
AR_MSPD:	0.717
AR_MSSD:	0.725
AR_VSD:	0.691
average_time_per_image:	32.945

Method: FreeZeV2.2

User	andreacaraffa
Publication	Caraffa et al: FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models, ECCV 2024
Implementation
Training image modalities	None
Test image modalities	RGB-D
Description	Submitted to: BOP Challenge 2025 Training data: We do not train models on 6D pose estimation data. We use two frozen models pre-trained on web-scale 2D images and 3D point clouds, respectively. Onboarding data: We render 162 templates for each object. We compute visual features from the rendered images, we back-project them into 3D and aggregate them. We compute geometric features directly from the 3D models. Used 3D models: CAD models for T-LESS, default models for the other datasets. Notes: We do not use task-specific training. We leverage two pre-trained geometric and vision foundation models, i.e. GeDi [A] and DINOv2 [B] to generate 3D discriminative point-level descriptors. We estimate objects' 6D pose via 3D registration based on RANSAC followed by ICP refinement. In this version, we enhance RANSAC by incorporating feature similarity into its fitness evaluation. We use ensembles of methods for 2D segmentation of unseen objects on the BOP-Classic-Core datasets and for 2D detection on BOP-Industrial datasets. [A] Poiesi et al.: Learning general and distinctive 3D local deep descriptors for point cloud registration, IEEE PAMI 2023 [B] Oquab et al.: DINOv2: Learning robust visual features without supervision, arXiv 2023 Authors: Andrea Caraffa, Davide Boscaini and Fabio Poiesi
Computer specifications	GPU L40S; CPU AMD EPYC 9474F @ 1.64GHz