BOP: Benchmark for 6D Object Pose Estimation

Submission: Zhigang-CDPN-ICCV19/LM-O/Zhigang-CDPN-ICCV19

Download submission

Submission name

Zhigang-CDPN-ICCV19

Submission time (UTC)

Oct. 22, 2019, 2:38 a.m.

User

ZhigangLi

Task

Model-based 6D localization of seen objects

Dataset

LM-O

Training model type

Default

Training image type

Synthetic (custom)

Description

For training data, we generated 10000 synthetic images for each object. The details can be found in the description of our method.

Evaluation scores

AR:	0.374
AR_MSPD:	0.558
AR_MSSD:	0.329
AR_VSD:	0.234
average_time_per_image:	0.331

Method: Zhigang-CDPN-ICCV19

User	ZhigangLi
Publication	CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation
Implementation	https://github.com/LZGMatrix/BOP19_CDPN_2019ICCV
Training image modalities	RGB
Test image modalities	RGB
Description	Our method mainly based on the CDPN-2019ICCV. The Results are obtained under the fixed training and test parameters for all datasets and all objects. For detection, different from the paper, we used the RetinaNet with BackBone of R-101-FPN. For TUD-L, YCB-V and TLESS, we used the provided real images for training. For others, we generated 10,000 synthetic images for each dataset. We trained a detector for each dataset. The detector was trained for 30 epochs, with 4 images and 3 workers per GPU and a learning rate of 5e-4. For CDPN model, different from the paper, both of the rotation and translation were solved from the predicted confidence map and coordinates map via PnP. We used the classification loss instead of regression loss for coordinates map and confidence map. We use Resnet-18 instead of Resnet-34 as the backbone. Both of the input resolution and coordinates-map resolution are 128*128. We introduced dilated-conv layers in the backbone and added skip connections between the backbone and the head net. We used Adam with an initial learning rate of 0.001 for optimization. The learning rate was halved for every 20 epochs. The CDPN model was trained for 160 epochs with batch size 32. We trained a CDPN model for each object. For training data, if real training images are available, we used them. Or, we generated 10000 synthetic images for each object. During training, the background of input was randomly sampled from PASCAL VOC 2012 Dataset.
Computer specifications	CPU: Intel i7-7700; GPU: GTX 1070; Memory: 16G