Method: Flose

User ahamza
Publication Generative 6D pose estimation via conditional flow matching - https://arxiv.org/abs/2602.19719
Implementation Pytorch
Views Single
Test image modalities RGB-D
Description

Training data: real + provided PBR

Used 3D models: CAD models for T-LESS and ITODD, default models for the other datasets.

Setting:

One network per dataset was trained. We use predicted segmentation masks from the "Model-based 2D segmentation of seen objects" track. Specifically, we use segmentation results from ZebraPose and select the mask with the highest confidence score. We use a generative approach to perform 6D pose estimation via conditional flow matching. An overlap-aware encoder based on PTv3 for geometric feature extraction. Frozen DINOv2 features to resolve symmetry ambiguities. A DiT-based flow model to predict the velocity fields in order to align point clouds. RANSAC-based registration for robust pose estimation. Poses are refined via ICP.

Reference: Generative 6D pose estimation via conditional flow matching - https://arxiv.org/abs/2602.19719

Authors: Amir Hamza, Davide Boscaini, Weihang Li, Benjamin Busam, Fabio Poiesi

Computer specifications NVIDIA A100 SXM4 64GB GPU, 32-core Intel Xeon CPU

Public submissions

Date Submission name Dataset
2026-03-03 14:40 - TUD-L
2026-03-03 14:40 - LM-O
2026-03-03 14:40 - YCB-V
2026-03-03 14:41 - IC-BIN
2026-03-03 14:41 - T-LESS
2026-03-03 14:42 - ITODD
2026-03-03 14:42 - HB