Submission: WAPR.v2.1(MUSE)/IC-BIN

Download submission
Submission name
Submission time (UTC) Feb. 25, 2026, 6:26 a.m.
User SEU_WYL
Task Model-based 6D localization of unseen objects
Dataset IC-BIN
Description
Evaluation scores
AR:0.709
AR_MSPD:0.718
AR_MSSD:0.725
AR_VSD:0.685
average_time_per_image:0.726

Method: WAPR.v2.1(MUSE)

User SEU_WYL
Publication
Implementation
Training image modalities RGB-D
Test image modalities RGB-D
Description

WAPR.v2 uses the same zero-shot 2D detector setting as FRTPose-WAPR.v2, but removes the FRTPose component, resulting in a fully zero-shot pipeline. This change improves inference speed while causing a slight reduction in accuracy. For each detected 2D bounding box, WAPR.v2 initializes 12 uniformly sampled pose hypotheses and refines each hypothesis for five iterations using the WAPR module. The refinement is wide-angle and tolerates initialization errors of up to ±90°. The refined hypotheses are then evaluated by the FoundationPose pose scoring network, and the highest-scoring pose is returned as the final prediction.

The computational cost can be approximated by the total number of pose operations. For a task with 30 objects and an image containing 100 detected 2D boxes, the workload is 30 (objects) × 100 (detections) × 12 (hypotheses) × (5 refinements + 1 scoring) = 216,000 pose operations. This value reflects the overall computation scale under parallel execution rather than sequential iterations. Therefore, the average runtime per image corresponds to the time required to process this parallel workload.

To further reduce runtime, two filtering stages are applied. First, low-confidence 2D detections are removed before pose refinement, directly reducing the number of candidate boxes. In the 6D detection (localization) setting, a fixed confidence threshold of 0.35 is used for all datasets and all objects. Moreover, all hyperparameters are kept identical across datasets and objects, as required by the BOP challenge setting, including the number of pose hypotheses, the refinement iterations, and the filtering thresholds. Under this standardized configuration, WAPR.v2 satisfies the runtime criterion of the 6D detection task, achieving an average runtime below one second per image. Second, coarse pruning is performed during the early refinement iterations to discard unlikely pose hypotheses, which reduces the total number of refinements executed in later stages.

Computer specifications