BOP: Benchmark for 6D Object Pose Estimation

User	SEU_WYL
Publication
Implementation
Training image modalities	RGB-D
Test image modalities	RGB-D
Description	Submitted to: BOP Challenge 2024 Training Process: Each object is trained for approximately 4.5 minutes using an H100 GPU. During training, the method renders object images online, relying solely on 1600 fixed background images and the input object models. No additional data is used. 3D Models: Reconstructed 3D models for T-LESS. Network: A separate ResNet34 model is trained for each object. (We did not use any pre-trained models of ResNet34, including those for any object pose estimation tasks. During training, we directly called the default ResNet34 provided by PyTorch without modifying the weights or loading any weight files.) 2D Bounding Box: The 2D detector used is specified in parentheses in the title. Epochs: 6000 Batch Size per Epoch: 24 Testing: We input RGBD data into ResNet34, which outputs the 3D surface coordinates corresponding to 2D coordinates and the object mask. We then use PnP to calculate the object pose based on the 2D-3D correspondence. Finally, we input the object pose and mask into FoundationPose. We modified FoundationPose to incorporate the PnP-based pose during uniform pose sampling. Due to the intermittent availability of H100 GPUs, testing time is calculated using an RTX 4090D. Authors: Yulin Wang, Jianghao Zhou, Hongli Li and Chen Luo
Computer specifications	RTX4090D

Submission: FRTPose (Default Detections)/HB