## BOP Challenge 2022

### 5. How to participate

For the pose estimation task, the instructions for participation are the same as in BOP Challenge 2019. For the detection/segmentation task, participants should submit a file which follows the format described below and which includes predictions for the set of images defined in files test_targets_bop19.json (these files are included in the base archives of the datasets; the set of considered images is the same for all tasks).

Results for all tasks should be submitted via this form. The online evaluation system uses script eval_bop19_pose.py to evaluate pose estimation results and script eval_bop22_coco.py to evaluate detection/segmentation results.

#### 5.1 Format of detection/segmentation results

We adopt and extend the format of the COCO Object Detection Challenge. Participants need to submit one JSON file per BOP dataset with results in the following format:

[
{
"scene_id"     : int,
"image_id"     : int,
"category_id"  : int,
"score"        : float,
"bbox"         : [x,y,width,height],
"segmentation" : RLE,
"time"         : float,
},
...
]
• scene_id, image_id, and category_id is the ID of scene, image and object respectively.
• score is a confidence of the estimate (the range of confidence values is not restricted).
• bbox is a list of four integers defining the predicted amodal 2D bounding box (amodal = encompassing the whole object silhouette, including the occluded parts). The box is of size (width, height) and its top-left corner is at (x,y).
• segmentation is a dictionary defining the predicted modal binary mask (covering only the visible object part) in the Run-length encoding (RLE) format. A binary mask can be converted to the RLE format using the binary_mask_to_rle() function from the BOP Toolkit.
• time is the time the method took to detect and/or segment all objects in the image image_id from scene scene_id. All estimates with the same scene_id and image_id must have the same value of time. Report the wall time from the point right after the raw data (the image, 3D object models etc.) is loaded to the point when the final detection/segmentation is available (a single real number in seconds, -1 if not available).

If your method only produces bounding boxes or segmentation masks, omit the missing entry in the file.

#### 5.2 Ground-truth in the COCO format

The ground-truth 2D bounding boxes and masks are provided in the COCO format in files scene_gt_coco.json. These JSON files can be downloaded from here. The JSON files were generated for test, validation and PBR training images of the seven core datasets from annotations saved in the BOP format using script calc_gt_coco.py. This script can be used to generate COCO annotations also for the other BOP datasets.

The "ignore" flag of an annotation in the COCO format is True if the object instance is visible from less than 10%.

#### 5.3 Terms & conditions

1. To be considered for the awards and for inclusion in a publication about the challenge, the authors need to provide documentation of the method (including specifications of the used computer) through the online submission form.
2. The winners need to present their methods at the awards reception.
3. After the submitted results are evaluated (by the online evaluation system), the authors can decide whether to make the scores visible to the public.

### 6. Evaluation methodology

For the pose estimation task, BOP Challenge 2022 follows the same evaluation methodology as the 2019 and 2020 editions to ensure directly comparable results.

For the object detection and segmentation tasks, we adopt the metrics of the COCO Object Detection Challenge. The final metric is the Average Precision (AP) calculated at different Intersection over Union (IoU=.50:.05:.95) values.

Note that a method is required to detect/segment only objects that are visible from at least 10%. However, if a method detects/segments also objects that are visible from less than 10%, these are ignored and not counted as false positives. In order to evaluate precision (measured by AP), a detection/segmentation method is not allowed to use any information about the annotated object instances – this is in contrast to the pose estimation task, where a method is allowed to use IDs of the visible object instances and where the performance is evaluated by recall (measured by AR; see reasons for using recall).

### 7. Organizers

Martin Sundermeyer, DLR German Aerospace Center
Tomáš Hodaň, Reality Labs at Meta
Yann Labbé, Inria Paris
Gu Wang, Tsinghua University
Eric Brachmann, Niantic
Bertram Drost, MVTec
Carsten Rother, Heidelberg University
Jiří Matas, Czech Technical University in Prague