Results of the BOP Challenge 2022 are published in:
M. Sundermeyer,
T. Hodaň,
Y. Labbé,
Gu Wang,
E. Brachmann,
B. Drost,
C. Rother,
J. Matas,
BOP Challenge 2022 on Detection, Segmentation and Pose Estimation of Specific Rigid Objects, CVPRW 2023 (CV4MR workshop)
[PDF,
SLIDES,
VIDEO 1, VIDEO 2,
BIB]
The 2022 edition of the BOP Challenge focuses on the task of 6D pose estimation of specific rigid objects, as in the 2019 and 2020 editions, and additionally on the tasks of 2D object detection and 2D object segmentation. The new tasks were introduced to address the design of many recent methods for object pose estimation, which start by detecting/segmenting objects and then estimate the object poses from the predicted regions. Evaluating the detection/segmentation stage and the pose estimation stage separately will help us to better understand advances in the two stages. The 2022 edition introduces new awards for the best detection/segmentation method and for the best pose estimation method that relies on default detections/segmentations ("synt+real" version) provided by the BOP organizers.
For the pose estimation task, the 2019, 2020 and 2022 editions share the same task definition, evaluation methodology, list of core datasets, instructions for participation, and leaderboard. This page describes only updates introduced in the 2022 edition.
The pose estimation task is defined as in the 2019 challenge edition. The detection and segmentation tasks are defined as follows.
Training input: At training time, a detection/segmentation method is provided a set of training images showing objects that are annotated with ground-truth 2D bounding boxes (for the detection task) and binary masks (for the segmentation task).
Test input: At test time, the method is given an image without any information about object instances visible in the image (in the pose estimation task, the method is also given identities of the visible object instances).
Test output: The method produces a list of detections/segmentations with confidences.
Awards for 6D object pose estimation methods:
Awards for 2D object detection/segmentation methods:
The conditions which a method needs to fulfill in order to qualify for the awards are the same as in BOP Challenge 2019.
The challenge is sponsored by Reality Labs at Meta and Niantic, who donated $4000 in total (each $2000, before tax) for the award money.
For the pose estimation task, the instructions for participation are the same as in BOP Challenge 2019. For the detection/segmentation task, participants should submit a file which follows the format described below and which includes predictions for the set of images defined in files test_targets_bop19.json (these files are included in the base archives of the datasets; the set of considered images is the same for all tasks).
Results for all tasks should be submitted via this form. The online evaluation system uses script eval_bop19_pose.py to evaluate pose estimation results and script eval_bop22_coco.py to evaluate detection/segmentation results.
We adopt and extend the format of the COCO Object Detection Challenge. Participants need to submit one JSON file per BOP dataset with results in the following format:
[ { "scene_id" : int, "image_id" : int, "category_id" : int, "score" : float, "bbox" : [x,y,width,height], "segmentation" : RLE, "time" : float, }, ... ]
scene_id
, image_id
, and category_id
is
the ID of scene, image and object respectively.
score
is a confidence of the estimate (the range of
confidence values is not restricted).
bbox
is a list of four integers defining the predicted amodal 2D bounding box (amodal = encompassing the whole object silhouette, including the occluded parts). The box is of size (width, height)
and its top-left corner is at (x,y)
.
segmentation
is a dictionary defining the predicted modal binary mask (covering only the visible object part) in the Run-length encoding (RLE) format. A binary mask can be converted to the RLE format using the binary_mask_to_rle() function from the BOP Toolkit.
time
is the time the method took to detect and/or segment all objects in the image
image_id
from scene scene_id
. All estimates with
the same
scene_id
and
image_id
must have the same value of time
. Report
the wall time from the point right after the raw data (the image, 3D object models etc.) is loaded to
the point when the final detection/segmentation is available (a single real number in seconds, -1
if not available).
If your method only produces bounding boxes or segmentation masks, omit the missing entry in the file.
The ground-truth 2D bounding boxes and masks are provided in the COCO format in files scene_gt_coco.json. These JSON files can be downloaded from here. The JSON files were generated for test, validation and PBR training images of the seven core datasets from annotations saved in the BOP format using script calc_gt_coco.py. This script can be used to generate COCO annotations also for the other BOP datasets.
The "ignore" flag of an annotation in the COCO format is True if the object instance is visible from less than 10%.
For the pose estimation task, BOP Challenge 2022 follows the same evaluation methodology as the 2019 and 2020 editions to ensure directly comparable results.
For the object detection and segmentation tasks, we adopt the metrics of the COCO Object Detection Challenge. The final metric is the Average Precision (AP) calculated at different Intersection over Union (IoU=.50:.05:.95) values.
Note that a method is required to detect/segment only objects that are visible from at least 10%. However, if a method detects/segments also objects that are visible from less than 10%, these are ignored and not counted as false positives. In order to evaluate precision (measured by AP), a detection/segmentation method is not allowed to use any information about the annotated object instances – this is in contrast to the pose estimation task, where a method is allowed to use IDs of the visible object instances and where the performance is evaluated by recall (measured by AR; see reasons for using recall).
Martin Sundermeyer, Google
Tomáš Hodaň, Reality Labs at Meta
Yann Labbé, Inria Paris
Gu Wang, Tsinghua University
Eric Brachmann, Niantic
Bertram Drost, MVTec
Carsten Rother, Heidelberg University
Jiří Matas, Czech Technical University in Prague