| User | IPT |
|---|---|
| Publication | Anonymous |
| Implementation | PyTorch |
| Views | Single |
| Test image modalities | RGB |
| Description | IPT-Detection: A Pretrained Transformer for CAD Prompted Detection.We are submitting IPT-Detection to the BOP Challenge 2025. This Foundation Models is designed for one-shot, image- and CAD-prompted object detection. It employs a vision transformer backbone to simultaneously regress 2D bounding boxes, object classifications, and a coarse orientation estimate from a single RGB image. Dataset and Training StrategyOur model is trained exclusively on large-scale synthetic datasets. This data is generated by rendering scenes in Blender, utilizing a diverse collection of over 100,000 unique CAD models collected from the public CAD model collections and other sources. The network is trained on a substantial dataset comprising over 500,000 synthetically rendered images to ensure robustness and generalization across a wide range of object instances and environmental conditions. Onboarding Procedure (less than 5 minutes per object)For each new CAD model, a set of reference templates is rendered, showing the CAD model in various canonical orientations. These templates are embedded into tokens that are then used for matching. The process takes 7-15s / CAD models. No training or fine-tuning is required. For the CAD model-free approach, we used real masked photographs of the object instead of the rendered templates. Note on runtime:The foundation model was designed for industrial applications where there is a single CAD model being searched for in a fixed depth range. Since we cannot make those assumptions in this challenge, we ended up running the model at 3 different depth ranges times per CAD model. So if a dataset had 40 potential cad models, we ended up having to run the model ~120 times per scene. The inference time of a single forward pass is only 0.23-0.9s on a V100 depending on image resolution and depth range. Authors: Temporary Anonymity |
| Computer specifications | V100 |
| Date | Submission name | Dataset | ||
|---|---|---|---|---|
| 2025-10-01 20:23 | - | YCB-V | ||
| 2025-10-01 20:25 | - | IC-BIN | ||
| 2025-10-01 20:27 | - | TUD-L | ||
| 2025-10-01 20:28 | - | T-LESS | ||
| 2025-10-01 20:28 | - | LM-O | ||
| 2025-10-01 20:30 | - | ITODD | ||
| 2025-10-01 20:30 | - | HB | ||
| 2025-10-01 20:30 | - | IPD | ||
| 2025-10-01 20:30 | - | XYZ-IBD | ||
| 2025-10-01 20:31 | - | ITODD-MV | ||
| 2025-10-01 20:43 | - | HOT3D | ||
| 2025-10-01 20:45 | - | HANDAL | ||
| 2025-10-01 20:45 | - | HOPEv2 | ||
| 2025-10-01 20:48 | - | IPD | ||
| 2025-10-01 20:48 | - | ITODD-MV | ||
| 2025-10-01 20:48 | - | XYZ-IBD | ||
| 2025-10-01 21:18 | - | HOPEv2 | ||
| 2025-10-01 21:49 | - | HANDAL | ||
| 2025-10-01 23:26 | - | HOT3D |