Submission name | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Submission time (UTC) | Nov. 2, 2024, 12:05 p.m. | ||||||||||
User | SEU_WYL | ||||||||||
Task | Model-based 6D localization of unseen objects | ||||||||||
Dataset | HB | ||||||||||
Description | |||||||||||
Evaluation scores |
|
User | SEU_WYL |
---|---|
Publication | |
Implementation | |
Training image modalities | RGB-D |
Test image modalities | RGB-D |
Description | Submitted to: BOP Challenge 2024 Training Process: Each object is trained in approximately 4.3 minutes on an RTX 4090D GPU. During training, the method generates object images online, utilizing only 1,600 fixed background images and object models from the datasets, without relying on any additional data. (A significant improvement in version v1 is the elimination of the need for high-performance GPUs like the H100. Instead, training can now be efficiently completed using cost-effective GPUs such as the RTX 4090D.) 3D Models: Reconstructed 3D models for T-LESS. Network: A separate ResNet34 model is trained for each object. (We did not use any pre-trained models of ResNet34, including those for any object pose estimation tasks. During training, we directly called the default ResNet34 provided by PyTorch without modifying the weights or loading any weight files.) 2D Bounding Box: The 2D detector used is specified in parentheses in the title. Epochs: 6000 Batch Size per Epoch: 24 Testing: We input RGBD data into ResNet34, which outputs the 3D surface coordinates corresponding to 2D coordinates and the object mask. We then use PnP to calculate the object pose based on the 2D-3D correspondence. Finally, we input the object pose and mask into FoundationPose. We modified FoundationPose to incorporate the PnP-based pose during uniform pose sampling. Authors: Temporary Anonymity |
Computer specifications | RTX 4090D |