Method: 3PT-Pose-Industrial (f.k.a. IPT)

User IPT
Publication Anonymous
Implementation PyTorch
Views Multi
Test image modalities RGB
Description

IPT-Pose-Industrial: A Two Stage Transformer for Pose Estimation

We are submitting IPT-Pose-Industrial to the BOP Challenge 2025.

This is a two stage foundation model, composed of an object detector (IPT-Detection) and a pose refinement network (IPT-Pose). IPT is a one-shot, image- and CAD-prompted object detection network. It employs a vision transformer backbone to simultaneously regress 2D bounding boxes, coarse object orientations, and object classifications. Initial poses are then estimated from IPT's outputs and passed to the pose refinement network which uses point-to-point correspondences across multiple views to refine the pose.

Dataset and Training Strategy

Our model is trained exclusively on large-scale synthetic datasets. This data is generated by rendering scenes in Blender, utilizing a diverse collection of over 100,000 unique CAD models collected from the public CAD model collections and other sources.

The network is trained on a substantial dataset comprising over 500,000 synthetically rendered images to ensure robustness and generalization across a wide range of object instances and environmental conditions.

Onboarding Procedure (less than 5 minutes per object)

For each new CAD model, a set of reference templates is generated, showing the CAD model in various canonical orientations. These templates serve as a reference for the model during IPT's inference process.

Specifics

We use 4 views except in the case of IPD where we only use 3 views. We only use RGB from each view, so we are treating each sensor like it’s RGB-Only. Pose accuracy comes from pixel-level accurate multi-view pose refinement.

Authors: Temporary Anonymity

Computer specifications V100

Public submissions

Date Submission name Dataset
2025-10-01 20:56 - IPD
2025-10-01 20:58 - XYZ-IBD
2025-10-01 21:00 - ITODD-MV
2025-10-01 21:05 - IPD
2025-10-01 21:05 - XYZ-IBD
2025-10-01 21:05 - ITODD-MV
2025-10-01 23:49 - XYZ-IBD