Submission: gfreedet2-2d_sam2.1-p/HOPEv2

Download submission
Submission name
Submission time (UTC) Sept. 2, 2025, 5:58 a.m.
User gfreedet
Task Model-free 2D detection of unseen objects
Dataset HOPEv2
Description
Evaluation scores
AP:0.452
AP50:0.657
AP75:0.487
AP_large:0.519
AP_medium:0.101
AP_small:0.044
AR1:0.480
AR10:0.536
AR100:0.536
AR_large:0.615
AR_medium:0.122
AR_small:0.044
average_time_per_image:2.039

Method: gfreedet2-2d_sam2.1-p

User gfreedet
Publication Not yet
Implementation
Training image modalities None
Test image modalities RGB
Description

Training data: None

Onboarding data:
Model-free: using static onboarding sequences to reconstruct 3DGS models, rendering templates for 2D detection (162 rendered images + 64 sampled static onboarding images). The average onboarding time for reconstructing a GS object and generating its templates/descriptors is about 105s.

Notes: For unified 3DGS reconstruction from pinhole and fisheye images, we use an adaptive perspective cropping strategy to preprocess static onboarding images. Then the object Gaussians are rapidly trained with these cropped pinhole images for 10k iterations. With the obtained GS models, we prepare templates as described above. For 2D detection, we use a modified CNOS augmented with appearance scores. The descriptor model is DINOv2, and the segmentor is SAM2.1. A postprocessing step is applied to fill small holes and remove small isolated areas.

Authors: Temporary Anonymity

Computer specifications NVIDIA L20