Submission name | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Submission time (UTC) | May 23, 2025, 6:41 a.m. | ||||||||||
User | SEU_WYL | ||||||||||
Task | Model-based 6D localization of unseen objects | ||||||||||
Dataset | IC-BIN | ||||||||||
Description | |||||||||||
Evaluation scores |
|
User | SEU_WYL |
---|---|
Publication | |
Implementation | |
Training image modalities | RGB-D |
Test image modalities | RGB-D |
Description | Submitted to: BOP Challenge 2025 Training Data: For seen tasks, we use the BOP-provided PBR data. For unseen tasks, we render data from the Google Scanned Objects (GSO) dataset. Training Process and Network for Unseen Tasks (within 5 minutes): Each object is trained for approximately 4.5 minutes on an RTX 4090. During training, the method renders object images online, using only 1600 fixed background images and the input object models, without employing any additional data. A separate ResNet34 model is trained per object. No pre-trained ResNet34 models (including those pre-trained on object pose estimation tasks) are used; we directly call the default ResNet34 provided by PyTorch without modifying the weights or loading any weight files. 2D Bounding Box: The 2D detector used is specified in parentheses in the title. Testing for Unseen Tasks: We input RGB-D data into ResNet34, which outputs 3D surface coordinates corresponding to 2D coordinates along with the object mask. We then apply PnP to compute the object pose from the 2D-3D correspondences. Next, the pose and mask are fed into WAPR, a pose refinement module trained on GSO data that corrects angular deviations up to 90°. Finally, we apply FoundationPose for small-scale pose refinement. Testing for Seen Tasks: The testing process is similar to the unseen setup, with two differences: we extend the ResNet34-based training time to achieve more accurate surface coordinate predictions, and we replace the 2D detector with YOLOv11. Multi 2D Detections: We utilize SAM6D-FastSAM, NIDS, CNOS, and MUSE. SAM6D / MUSE: For fast detection on the BOP-Industrial datasets, we use MUSE on ITODD-MV and SAM6D on IPD and XYZIBD. Multi-Camera Setup: For IPD, both seen and unseen tasks use multi-camera setups. In unseen tasks, the ResNet34-based network is not employed, aligning with the OpenCV BPC Challenge’s definition of one-shot (zero-shot) tasks. In ITODD-MV, both seen and unseen tasks use only a single camera. 6D Localization and 6D Detection: For 6D localization tasks within 1 second, we use the object category present in the image. For all other tasks, we use only the scene and image indices. Authors: Temporary Anonymity |
Computer specifications | RTX4090D |