BOP Challenge 2022

News about BOP Challenge 2022 (join the BOP Google group for all updates):

Results of the BOP Challenge 2022 are published in:
Martin Sundermeyer, Tomáš Hodaň, Yann Labbé, Gu Wang, Eric Brachmann, Bertram Drost, Carsten Rother, Jiří Matas,
BOP Challenge 2022 on Detection, Segmentation and Pose Estimation of Specific Rigid Objects, CVPRW 2023 (CV4MR workshop)
[PDF, SLIDES, VIDEO 1, VIDEO 2, BIB]

1. Introduction

The 2022 edition of the BOP Challenge focuses on the task of 6D pose estimation of specific rigid objects, as in the 2019 and 2020 editions, and additionally on the tasks of 2D object detection and 2D object segmentation. The new tasks were introduced to address the design of many recent methods for object pose estimation, which start by detecting/segmenting objects and then estimate the object poses from the predicted regions. Evaluating the detection/segmentation stage and the pose estimation stage separately will help us to better understand advances in the two stages. The 2022 edition introduces new awards for the best detection/segmentation method and for the best pose estimation method that relies on default detections/segmentations ("synt+real" version) provided by the BOP organizers.

For the pose estimation task, the 2019, 2020 and 2022 editions share the same task definition, evaluation methodology, list of core datasets, instructions for participation, and leaderboard. This page describes only updates introduced in the 2022 edition.

2. Important dates

3. Tasks

The pose estimation task is defined as in the 2019 challenge edition. The detection and segmentation tasks are defined as follows.

Training input: At training time, a detection/segmentation method is provided a set of training images showing objects that are annotated with ground-truth 2D bounding boxes (for the detection task) and binary masks (for the segmentation task).

Test input: At test time, the method is given an image without any information about object instances visible in the image (in the pose estimation task, the method is also given identities of the visible object instances).

Test output: The method produces a list of detections/segmentations with confidences.

4. Awards

Awards for 6D object pose estimation methods:

  • The Overall Best Method – The top method on the core datasets.
  • The Best RGB-Only Method – The top RGB-only method on the core datasets.
  • The Best Fast Method – The top method on the core datasets with the average running time per image below 1s.
  • The Best BlenderProc-Trained Method – The top method on the core datasets trained only with the provided BlenderProc images.
  • The Best Single-Model Method – The top method on the core datasets which uses a single machine learning model (typically a neural network) per dataset.
  • The Best Open-Source Method – The top method on the core datasets whose source code is publicly available.
  • The Best Method Using Provided Detections/Segmentations – The top method on the core datasets that relies on default detections/segmentations ("synt+real" version).
  • The Best Method on Dataset D – The top method on each of the available datasets.

Awards for 2D object detection/segmentation methods:

  • The Overall Best Detection Method – The top object detection method on the core datasets.
  • The Best BlenderProc-Trained Detection Method – The top object detection method on the core datasets trained only with the provided BlenderProc images.
  • The Overall Best Segmentation Method – The top object detection method on the core datasets.
  • The Best BlenderProc-Trained Segmentation Method – The top object segmentation method on the core datasets trained only with the provided BlenderProc images.

The conditions which a method needs to fulfill in order to qualify for the awards are the same as in BOP Challenge 2019.

The challenge is sponsored by Reality Labs at Meta and Niantic, who donated $4000 in total (each $2000, before tax) for the award money.

5. How to participate

For the pose estimation task, the instructions for participation are the same as in BOP Challenge 2019. For the detection/segmentation task, participants should submit a file which follows the format described below and which includes predictions for the set of images defined in files test_targets_bop19.json (these files are included in the base archives of the datasets; the set of considered images is the same for all tasks).

Results for all tasks should be submitted via this form. The online evaluation system uses script eval_bop19_pose.py to evaluate pose estimation results and script eval_bop22_coco.py to evaluate detection/segmentation results.

5.1 Format of detection/segmentation results

We adopt and extend the format of the COCO Object Detection Challenge. Participants need to submit one JSON file per BOP dataset with results in the following format:

[
    {
        "scene_id"     : int,
        "image_id"     : int,
        "category_id"  : int,
        "score"        : float,
        "bbox"         : [x,y,width,height],  # mandatory only in 2D detection tasks
        "segmentation" : RLE,                 # mandatory only in 2D segmentation tasks
        "time"         : float,
    },
    ...
]
  • scene_id, image_id, and category_id is the ID of scene, image and object respectively.
  • score is a confidence of the estimate (the range of confidence values is not restricted).
  • bbox is a list of four integers defining the predicted amodal 2D bounding box (amodal = encompassing the whole object silhouette, including the occluded parts). The box is of size (width, height) and its top-left corner is at (x,y).
  • segmentation is a dictionary defining the predicted modal binary mask (covering only the visible object part) in the Run-length encoding (RLE) format. A binary mask can be converted to the RLE format using the binary_mask_to_rle() function from the BOP Toolkit.
  • time is the time the method took to detect and/or segment all objects in the image image_id from scene scene_id. All estimates with the same scene_id and image_id must have the same value of time. Report the wall time from the point right after the raw data (the image, 3D object models etc.) is loaded to the point when the final detection/segmentation is available (a single real number in seconds, -1 if not available).

The "bbox" item is mandatory for 2D detection tasks, and the "segmentation" item is mandatory for 2D segmentation tasks. If submitting to a 2D detection task a file containing "segmentation" items (which will be ignored), we ask users to compress using gzip (resulting in a file with .json.gz extension).

5.2 Ground-truth in the COCO format

The ground-truth 2D bounding boxes and masks are provided in the COCO format in files scene_gt_coco.json. These JSON files can be downloaded from here. The JSON files were generated for test, validation and PBR training images of the seven core datasets from annotations saved in the BOP format using script calc_gt_coco.py. This script can be used to generate COCO annotations also for the other BOP datasets.

The "ignore" flag of an annotation in the COCO format is True if the object instance is visible from less than 10%.

5.3 Terms & conditions

  1. To be considered for the awards and for inclusion in a publication about the challenge, the authors need to provide documentation of the method (including specifications of the used computer) through the online submission form.
  2. The winners need to present their methods at the awards reception.
  3. After the submitted results are evaluated (by the online evaluation system), the authors can decide whether to make the scores visible to the public.

6. Evaluation methodology

For the pose estimation task, BOP Challenge 2022 follows the same evaluation methodology as the 2019 and 2020 editions to ensure directly comparable results.

For the object detection and segmentation tasks, we adopt the metrics of the COCO Object Detection Challenge. The final metric is the Average Precision (AP) calculated at different Intersection over Union (IoU=.50:.05:.95) values.

Note that a method is required to detect/segment only objects that are visible from at least 10%. However, if a method detects/segments also objects that are visible from less than 10%, these are ignored and not counted as false positives. In order to evaluate precision (measured by AP), a detection/segmentation method is not allowed to use any information about the annotated object instances – this is in contrast to the pose estimation task, where a method is allowed to use IDs of the visible object instances and where the performance is evaluated by recall (measured by AR; see reasons for using recall).

7. Organizers

Martin Sundermeyer, Google
Tomáš Hodaň, Reality Labs at Meta
Yann Labbé, Inria Paris
Gu Wang, Tsinghua University
Eric Brachmann, Niantic
Bertram Drost, MVTec
Carsten Rother, Heidelberg University
Jiří Matas, Czech Technical University in Prague