Submission: OPFormer-Coarse (CNOS)/YCB-V/Coarse Main

Download submission
Submission name Coarse Main
Submission time (UTC) Jan. 29, 2025, 8:47 p.m.
User morozart
Task Model-based 6D localization of unseen objects
Dataset YCB-V
Description
Evaluation scores
AR:0.646
AR_MSPD:0.782
AR_MSSD:0.613
AR_VSD:0.543
average_time_per_image:0.398

Method: OPFormer-Coarse (CNOS)

User morozart
Publication Not yet
Implementation
Training image modalities RGB-D
Test image modalities RGB
Description

Submitted to:  BOP Challenge 2024

Training data:  MegaPose-ShapeNet and MegaPose-GSO synthetic datasets

Onboarding data:  Model-based: 42 templates are rendered from CAD model using BlenderProc Model-free: 42 templates are rendered from a NeRF, trained on a set of multi view images with camera poses

Used 3D models: Default models for all datasets in model-based challenge.

Notes: The methodology addresses 6D object pose estimation for novel objects in both model-based and model-free scenarios. It assumes the availability of 2D detections in the form of bounding boxes and object categories. Patch descriptors are extracted from cropped test images and object templates using the frozen DINOv2 [A] feature extractor. A transformer encoder aggregates the template patch descriptors and applies 3D positional embedding to generate enhanced object representation. Subsequently, a transformer decoder establishes correspondences between template and query image patch descriptors. Finally, method refines correspondences, selects top-k templates, establishes 2D-3D correspondences and estimates a 6D pose is via RANSAC-PnP.

[A] Maxime Oquab et al: DINOv2: Learning Robust Visual Features without Supervision, Transactions on Machine Learning Research

Computer specifications NVIDIA A40 (inference), NVIDIA A100 (training)