BOP: Benchmark for 6D Object Pose Estimation

Submission: MUSE/IC-BIN/muse_full

Download submission

Submission name

muse_full

Submission time (UTC)

Aug. 26, 2025, 5:47 a.m.

User

csm8167

Task

Model-based 2D detection of unseen objects

Dataset

IC-BIN

Description

Evaluation scores

AP:	0.337
AP50:	0.626
AP75:	0.342
AP_large:	0.585
AP_medium:	0.362
AP_small:	0.027
AR1:	0.103
AR10:	0.403
AR100:	0.425
AR_large:	0.729
AR_medium:	0.438
AR_small:	0.027
average_time_per_image:	0.556

Method: MUSE

User	csm8167
Publication	Not Yet
Implementation
Training image modalities	None
Test image modalities	RGB
Description	Submitted to: BOP Challenge 2024 MUSE: Model-agnostic Unseen 2D Object Recognition via 3D-aware Similarity of Multi-Embeddings We present MUSE, a training-free and model-agnostic framework for unseen 2D object recognition, leveraging 3D-aware similarity computed from multi-embedding descriptors. Specifically, MUSE integrates class-level and patch-level embeddings into a novel similarity metric, and introduces the Integrated von Mises-Fisher (I-vMF) similarity, which applies the von Mises-Fisher (vMF) distribution to weigh the contributions of 3D template views. This weighting reflects the assumption that high similarity scores are concentrated around the correct template view on the viewing sphere. To further enhance reliability, we propose Confidence-Assisted Similarity (CAS), which modulates the I-vMF similarity using the uncertainty estimate of the vision model, giving more influence to confident predictions. As our approach relies solely on similarity computations over feature embeddings, MUSE is fully model-agnostic and can be integrated with any vision backbone without fine-tuning. In our implementation, we use Grounding DINO and SAM2 to extract detection proposals, and adopt DINOv2-Large as the feature encoder for computing multi-level similarity. Authors – Temporary Anonymous
Computer specifications	rtx4090