|Submission time (UTC)||Oct. 22, 2019, 2:38 a.m.|
|Task||6D localization of seen objects|
|Training model type||Default|
|Training image type||Synthetic (custom)|
|Description||For training data, we generated 10000 synthetic images for each object. The details can be found in the description of our method.|
|Publication||CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation|
|Training image modalities||RGB|
|Test image modalities||RGB|
Our method mainly based on the CDPN-2019ICCV. The Results are obtained under the fixed training and test parameters for all datasets and all objects.
For detection, different from the paper, we used the RetinaNet with BackBone of R-101-FPN. For TUD-L, YCB-V and TLESS, we used the provided real images for training. For others, we generated 10,000 synthetic images for each dataset. We trained a detector for each dataset. The detector was trained for 30 epochs, with 4 images and 3 workers per GPU and a learning rate of 5e-4.
For CDPN model, different from the paper, both of the rotation and translation were solved from the predicted confidence map and coordinates map via PnP. We used the classification loss instead of regression loss for coordinates map and confidence map. We use Resnet-18 instead of Resnet-34 as the backbone. Both of the input resolution and coordinates-map resolution are 128*128. We introduced dilated-conv layers in the backbone and added skip connections between the backbone and the head net. We used Adam with an initial learning rate of 0.001 for optimization. The learning rate was halved for every 20 epochs. The CDPN model was trained for 160 epochs with batch size 32. We trained a CDPN model for each object. For training data, if real training images are available, we used them. Or, we generated 10000 synthetic images for each object. During training, the background of input was randomly sampled from PASCAL VOC 2012 Dataset.
|Computer specifications||CPU: Intel i7-7700; GPU: GTX 1070; Memory: 16G|