Submission: FRTPose.v1 (SAM6D-FastSAM)/YCB-V

Download submission
Submission name
Submission time (UTC) Nov. 24, 2024, 7:57 a.m.
User SEU_WYL
Task Model-based 6D detection of unseen objects
Dataset YCB-V
Description
Evaluation scores
AP:0.863
AP_MSPD:0.832
AP_MSSD:0.893
average_time_per_image:26.448

Method: FRTPose.v1 (SAM6D-FastSAM)

User SEU_WYL
Publication
Implementation
Training image modalities RGB-D
Test image modalities RGB-D
Description

Submitted to: BOP Challenge 2024

Training Process: Each object is trained in approximately 4.3 minutes on an RTX 4090D GPU. During training, the method generates object images online, utilizing only 1,600 fixed background images and object models from the datasets, without relying on any additional data. (A significant improvement in version v1 is the elimination of the need for high-performance GPUs like the H100. Instead, training can now be efficiently completed using cost-effective GPUs such as the RTX 4090D.)

3D Models: Reconstructed 3D models for T-LESS.

Network: A separate ResNet34 model is trained for each object. (We did not use any pre-trained models of ResNet34, including those for any object pose estimation tasks. During training, we directly called the default ResNet34 provided by PyTorch without modifying the weights or loading any weight files.)

2D Bounding Box: The 2D detector used is specified in parentheses in the title.

Epochs: 6000

Batch Size per Epoch: 24

Testing: We input RGBD data into ResNet34, which outputs the 3D surface coordinates corresponding to 2D coordinates and the object mask. We then use PnP to calculate the object pose based on the 2D-3D correspondence. Finally, we input the object pose and mask into FoundationPose. We modified FoundationPose to incorporate the PnP-based pose during uniform pose sampling.

Authors: Temporary Anonymity

Computer specifications RTX 4090D