Submission: WAPR.v2(Multi 2D Detections)/ITODD

Download submission
Submission name
Submission time (UTC) March 1, 2026, 2:16 a.m.
User SEU_WYL
Task Model-based 6D localization of unseen objects
Dataset ITODD
Description
Evaluation scores
AR:0.798
AR_MSPD:0.827
AR_MSSD:0.835
AR_VSD:0.731
average_time_per_image:30.255

Method: WAPR.v2(Multi 2D Detections)

User SEU_WYL
Publication
Implementation
Training image modalities RGB-D
Test image modalities RGB-D
Description

WAPR.v2 adopts the same zero-shot 2D detector setting as FRTPose-WAPR.v2. Unlike FRTPose-WAPR.v2, WAPR.v2 removes the FRTPose component, making it a fully zero-shot method. This modification increases inference speed but results in a slight reduction in accuracy. Similar to FRTPose-WAPR.v2, WAPR.v2 initializes 24 uniformly sampled candidate poses for each detected object and applies the WAPR module to refine each pose five times. The WAPR module supports wide-angle pose refinement, allowing initialization errors of up to ±90°. Finally, the refined poses are scored by the FoundationPose pose scoring network, and the pose with the highest score is selected as the final result.

For a task involving 30 different objects and an image containing 100 detected 2D bounding boxes, the total computational workload can be expressed as 30 (objects) × 100 (detections) × 24 (candidate poses) × (5 (refinements) + 1 (scoring)) = 432,000. These 432,000 parallel operations represent the overall computation scale rather than sequential iterations. Under this configuration, the average computation time per image corresponds to the time required for handling this parallel workload. If the 2D detection confidence scores and classifications are considered reliable, a coarse filtering of low-confidence bounding boxes can be applied. This significantly reduces the number of candidate detections, thereby reducing the average computation time per image to less than one second.

In addition, to further reduce computation time, non-maximum suppression (NMS) is applied to remove redundant 2D bounding boxes. This reduces the overall runtime from approximately 120 s to approximately 50 s.

Computer specifications 5090