BOP: Benchmark for 6D Object Pose Estimation

Submission: OPFormer (CNOS)/YCB-V/Coarse Main

Download submission

Submission name

Coarse Main

Submission time (UTC)

Jan. 29, 2025, 8:47 p.m.

User

morozart

Task

Model-based 6D localization of unseen objects

Dataset

YCB-V

Description

Evaluation scores

AR:	0.646
AR_MSPD:	0.782
AR_MSSD:	0.613
AR_VSD:	0.543
average_time_per_image:	0.398

Method: OPFormer (CNOS)

User	morozart
Publication	Not yet
Implementation
Training image modalities	RGB-D
Test image modalities	RGB
Description	Submitted to: BOP Challenge 2024 Training data: MegaPose-ShapeNet and MegaPose-GSO synthetic datasets Onboarding data: Model-based: 42 templates are rendered from CAD model using BlenderProc Model-free: 42 templates are rendered from a NeRF, trained on a set of multi view images with camera poses Used 3D models: Default models for all datasets in model-based challenge. Notes: The methodology addresses 6D object pose estimation for novel objects in both model-based and model-free scenarios. It assumes the availability of 2D detections in the form of bounding boxes and object categories. Patch descriptors are extracted from cropped test images and object templates using the frozen DINOv2 [A] feature extractor. A transformer encoder aggregates the template patch descriptors and applies 3D positional embedding to generate enhanced object representation. Subsequently, a transformer decoder establishes correspondences between template and query image patch descriptors. Finally, method refines correspondences, selects top-k templates, establishes 2D-3D correspondences and estimates a 6D pose is via RANSAC-PnP. [A] Maxime Oquab et al: DINOv2: Learning Robust Visual Features without Supervision, Transactions on Machine Learning Research
Computer specifications	NVIDIA A40 (inference), NVIDIA A100 (training)