Submission: gfreedet2-6d_lite/HOT3D

Download submission
Submission name
Submission time (UTC) Sept. 3, 2025, 8:12 a.m.
User gfreedet
Task Model-free 6D detection of unseen objects
Dataset HOT3D
Description
Evaluation scores
AP:0.423
AP_25:0.374
AP_25_mm:0.120
AP_MSPD:0.472
AP_MSSD:0.374
AP_MSSD_mm:0.120
average_time_per_image:10.387

Method: gfreedet2-6d_lite

User gfreedet
Publication Not yet
Implementation
Training image modalities None
Test image modalities RGB
Description

Training data: None

Onboarding data:

Model-free: using static onboarding sequences to reconstruct 3DGS models, rendering templates for 2D detection (162 rendered images + 64 sampled static onboarding images) and coarse pose estimation (~800 rendered images). For coarse 6D detection, the template image size is 280, and the descriptor model is DINOv2-S instead of DINOv2-L. The average onboarding time for reconstructing a GS object and generating its templates/descriptors for 2D detection and 6D coarse detection is about 215s.

Notes:

For unified 3DGS reconstruction from pinhole and fisheye images, we use an adaptive perspective cropping strategy to preprocess static onboarding images. Then the object Gaussians are rapidly trained with these cropped pinhole images for 10k iterations. With the obtained GS models, we prepare templates as described above.

For 2D detection, we use a modified CNOS augmented with appearance scores. The descriptor model is DINOv2 and the segmentor is FastSAM.

For coarse 6D pose detection, we extend FoundPose to support the model-free setting by using the templates rendered from 3DGS. We further extend FoundPose to support correct and unified perspective cropping for pinhole/fisheye query images. For this lite version, we implement a retrieval-only version by discarding the PnP/RANSAC step of FoundPose for faster coarse pose estimation.

For fine 6D pose detection, we extend GoTrack (which estimates pose via render-to-observation flow and PnP/RANSAC) to support the model-free setting by leveraging the gsplat renderer.

Authors: Temporary Anonymity

Computer specifications NVIDIA L20