BOP: Benchmark for 6D Object Pose Estimation

Submission: CNOS (FastSAM) - Static onboarding/HOPEv2/Static_onboarding

Download submission

Submission name

Static_onboarding

Submission time (UTC)

Aug. 24, 2025, 6:40 p.m.

User

nvnguyen

Task

Model-free 2D detection of unseen objects

Dataset

HOPEv2

Description

Evaluation scores

AP:	0.343
AP50:	0.520
AP75:	0.362
AP_large:	0.396
AP_medium:	0.032
AP_small:	0.089
AR1:	0.394
AR10:	0.457
AR100:	0.457
AR_large:	0.528
AR_medium:	0.084
AR_small:	0.089
average_time_per_image:	0.469

Method: CNOS (FastSAM) - Static onboarding

User	nvnguyen
Publication	https://arxiv.org/pdf/2307.11067
Implementation	https://github.com/nv-nguyen/cnos
Training image modalities	RGB
Test image modalities	RGB
Description	A simple baseline for model-free unseen object detection/segmentation with Fast Segment Anything (FastSAM) and DINOv2. This three-stage approach can work for any object without retraining: Onboarding stage: For each object in the test dataset, we randomly select 100 reference images from the onboarding videos (50 images per video) and crop the object from these images using the provided 2D bounding box. Then we calculate the CLS-token descriptors of the crops using DINOv2. This process generates a set of reference descriptors of size "num_objects x 100 x C" for the testing dataset, where "num_objects" represents the number of test objects, and "C" denotes the descriptor size. Proposal stage: We generate object proposals using FastSAM. Each proposal is defined by a binary mask and a 2D bounding box of the mask. Matching stage: We calculate the CLS-token DINOv2 descriptors for the FastSAM proposals and compare them with the reference descriptors using cosine similarity. This process generates a similarity matrix of size "num_objects x 100". We then average the similarity scores over the 100 views to obtain a “ score" of the proposal with respect to each test object. Finally, we assign an object ID to each proposal by selecting the highest score using argmax.
Computer specifications	V100