Submission: FRTPose (SAM6D-FastSAM)/HB

Download submission
Submission name
Submission time (UTC) Sept. 4, 2024, 11:58 a.m.
User SEU_WYL
Task Model-based 6D localization of unseen objects
Dataset HB
Description
Evaluation scores
AR:0.896
AR_MSPD:0.908
AR_MSSD:0.907
AR_VSD:0.872
average_time_per_image:41.448

Method: FRTPose (SAM6D-FastSAM)

User SEU_WYL
Publication
Implementation
Training image modalities RGB-D
Test image modalities RGB-D
Description

Submitted to: BOP Challenge 2024

Training Process: Each object is trained for approximately 4.5 minutes using an H100 GPU. During training, the method renders object images online, relying solely on 1600 fixed background images and the input object models. No additional data is used.

3D Models: Reconstructed 3D models for T-LESS.

Network: A separate ResNet34 model is trained for each object. (We did not use any pre-trained models of ResNet34, including those for any object pose estimation tasks. During training, we directly called the default ResNet34 provided by PyTorch without modifying the weights or loading any weight files.)

2D Bounding Box: The 2D detector used is specified in parentheses in the title.

Epochs: 6000

Batch Size per Epoch: 24

Testing: We input RGBD data into ResNet34, which outputs the 3D surface coordinates corresponding to 2D coordinates and the object mask. We then use PnP to calculate the object pose based on the 2D-3D correspondence. Finally, we input the object pose and mask into FoundationPose. We modified FoundationPose to incorporate the PnP-based pose during uniform pose sampling. Due to the intermittent availability of H100 GPUs, testing time is calculated using an RTX 4090D.

Authors: Temporary Anonymity

Computer specifications RTX4090D