BOP: Benchmark for 6D Object Pose Estimation

Submission: Flose/T-LESS

Download submission

Submission name

Submission time (UTC)

March 3, 2026, 2:41 p.m.

User

ahamza

Task

Model-based 6D localization of seen objects

Dataset

T-LESS

Description

Evaluation scores

AR:	0.869
AR_MSPD:	0.895
AR_MSSD:	0.885
AR_VSD:	0.827
average_time_per_image:	2.379

Method: Flose

User	ahamza
Publication	Generative 6D pose estimation via conditional flow matching - https://arxiv.org/abs/2602.19719
Implementation	Pytorch
Training image modalities	RGB-D
Test image modalities	RGB-D
Description	Training data: real + provided PBR Used 3D models: CAD models for T-LESS and ITODD, default models for the other datasets. Setting: One network per dataset was trained. We use predicted segmentation masks from the "Model-based 2D segmentation of seen objects" track. Specifically, we use segmentation results from ZebraPose and select the mask with the highest confidence score. We use a generative approach to perform 6D pose estimation via conditional flow matching. An overlap-aware encoder based on PTv3 for geometric feature extraction. Frozen DINOv2 features to resolve symmetry ambiguities. A DiT-based flow model to predict the velocity fields in order to align point clouds. RANSAC-based registration for robust pose estimation. Poses are refined via ICP. Reference: Generative 6D pose estimation via conditional flow matching - https://arxiv.org/abs/2602.19719 Authors: Amir Hamza, Davide Boscaini, Weihang Li, Benjamin Busam, Fabio Poiesi
Computer specifications	NVIDIA A100 SXM4 64GB GPU, 32-core Intel Xeon CPU