BOP: Benchmark for 6D Object Pose Estimation

Submission: gfreedet2-2d_fastsam/HOPEv2

Download submission

Submission name

Submission time (UTC)

Sept. 1, 2025, 9:35 a.m.

User

gfreedet

Task

Model-free 2D detection of unseen objects

Dataset

HOPEv2

Description

Evaluation scores

AP:	0.369
AP50:	0.561
AP75:	0.392
AP_large:	0.426
AP_medium:	0.034
AP_small:	0.089
AR1:	0.409
AR10:	0.470
AR100:	0.470
AR_large:	0.538
AR_medium:	0.090
AR_small:	0.089
average_time_per_image:	0.287

Method: gfreedet2-2d_fastsam

User	gfreedet
Publication	Not yet
Implementation
Training image modalities	None
Test image modalities	RGB
Description	Training data: None Onboarding data: Model-free: using static onboarding sequences to reconstruct 3DGS models, rendering templates for 2D detection (162 rendered images + 64 sampled static onboarding images). The average onboarding time for reconstructing a GS object and generating its templates/descriptors is about 105s. Notes: For unified 3DGS reconstruction from pinhole and fisheye images, we use an adaptive perspective cropping strategy to preprocess static onboarding images. Then the object Gaussians are rapidly trained with these cropped pinhole images for 10k iterations. With the obtained GS models, we prepare templates as described above. For 2D detection, we use a modified CNOS augmented with appearance scores. The descriptor model is DINOv2, and the segmentor is FastSAM. Authors: Temporary Anonymity
Computer specifications	NVIDIA L20