| Submission name | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Submission time (UTC) | Jan. 29, 2026, 4:18 p.m. | ||||||||||||||
| User | alignpose | ||||||||||||||
| Task | Model-based 6D detection of unseen objects | ||||||||||||||
| Dataset | IPD | ||||||||||||||
| Description | The upstream 3PT-Detection detector produces a large number of 2D detections per scene. Since our method refines each individually, the runtime scales with detection count. To reduce computation, we used the provided target object IDs to filter out detections of non-target objects. This does not affect AP scores, but it decreases the runtime, which would result in unfair comparison to other methods. We therefore do not report time, though it exceeds 200s due to large number of detections. | ||||||||||||||
| Evaluation scores |
|
| User | alignpose |
|---|---|
| Publication | |
| Implementation | |
| Training image modalities | None |
| Test image modalities | RGB |
| Description | Detections: 3PT Single-view: FoundPose + FeatRef + Megapose Multi-view: AlignPose The presented results were obtained by the AlignPose [1] multi-view pipeline. Each view is first processed independently using 2D detections from 3PT-Detection and SAM2 [2] segmentations. Initial pose estimates are obtained for each view with single-view method FoundPose [3] and refined with FoundPose featuremetric refinement and MegaPose [4] refinement. Multi-view consistent poses are produced with AlignPose pipeline that aggregates all single-view candidates with Non Maximal Suppression and refines them with multi-view feature-metric refinement. [1] Anonymous: AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment |
| Computer specifications |