BOP: Benchmark for 6D Object Pose Estimation

Submission: FRTPose-WAPR.v2（Multi 2D Detections）/T-LESS

Download submission

Submission name

Submission time (UTC)

Sept. 29, 2025, 9:30 a.m.

User

SEU_WYL

Task

Model-based 6D localization of unseen objects

Dataset

T-LESS

Description

Evaluation scores

AR:	0.834
AR_MSPD:	0.870
AR_MSSD:	0.851
AR_VSD:	0.780
average_time_per_image:	556.694

Method: FRTPose-WAPR.v2（Multi 2D Detections）

User	SEU_WYL
Publication
Implementation
Training image modalities	RGB-D
Test image modalities	RGB-D
Description	Submitted to: BOP Challenge 2025 Training Data: For seen tasks, we use the BOP-provided PBR data. For unseen tasks, we render data from the Google Scanned Objects (GSO) dataset. Training Process and Network for Unseen Tasks (within 5 minutes): Each object is trained for approximately 4.5 minutes on an RTX 4090. During training, the method renders object images online, using only 1600 fixed background images and the input object models, without employing any additional data. A separate ResNet34 model is trained per object. No pre-trained ResNet34 models (including those pre-trained on object pose estimation tasks) are used; we directly call the default ResNet34 provided by PyTorch without modifying the weights or loading any weight files. 2D Bounding Box: The 2D detector used is specified in parentheses in the title. Testing for Unseen Tasks: We input RGB-D data into ResNet34, which outputs 3D surface coordinates corresponding to 2D coordinates along with the object mask. We then apply PnP to compute the object pose from the 2D-3D correspondences. Next, the pose and mask are fed into WAPR, a pose refinement module trained on GSO data that corrects angular deviations up to 90°. Finally, we apply FoundationPose for small-scale pose refinement. Testing for Seen Tasks: The testing process is similar to the unseen setup, with two differences: we extend the ResNet34-based training time to achieve more accurate surface coordinate predictions, and we replace the 2D detector with YOLOv11. Multi 2D Detections: We utilize SAM6D, NIDS, CNOS, and MUSE. 6D Localization and 6D Detection: For 6D localization tasks within 1 second, we use the object category present in the image. For all other tasks, we use only the scene and image indices. The difference between v2 and the original version lies in the use of a higher-precision WAPR module in v2. In addition, v2 employs a newly trained Foundation model for pose scoring, which is primarily applied to textureless object datasets. Authors: Temporary Anonymity
Computer specifications	5090