Model-based tasks on seen objects:
Model-based tasks on unseen objects:
Model-free tasks on unseen objects:
Used in the 2019, 2020, 2022, 2022, and 2023 challenges. Can be evaluated on BOP-Classic datasets.
Training input: At training time, a method is provided a set of RGB-D training images showing objects annotated with ground-truth 6D poses, and 3D mesh models of the objects (typically with a color texture). A 6D pose is defined by a matrix $\textbf{P} = [\mathbf{R} \, | \, \mathbf{t}]$, where $\mathbf{R}$ is a 3D rotation matrix, and $\mathbf{t}$ is a 3D translation vector. The matrix $\textbf{P}$ defines a rigid transformation from the 3D space of the object model to the 3D space of the camera.
Test input: At test time, the method is given an RGB-D image unseen during training and a list $L = [(o_1, n_1),$ $\dots,$ $(o_m, n_m)]$, where $n_i$ is the number of instances of an object $o_i$ that are visible in the image. The method can use default detections (results of GDRNPPDet_PBRReal, the best 2D detection method from 2022 for ModelBased-2DDet-Seen).
Test output: The method produces a list $E=[E_1,$$\dots,$$E_m]$, where $E_i$ is a list of $n_i$ pose estimates with confidences for instances of object $o_i$.
Evaluation methodology: The error of an estimated pose w.r.t. the ground-truth pose is calculated by three pose-error functions (see Section 2.2 in the BOP 2020 paper):
Used in the 2024 challenge. Can be evaluated on BOP-Classic and BOP-H3 datasets.
Training input: As in ModelBased-6DLoc-Seen.
Test input: At test time, the method is given an RGB-D image unseen during training that shows an arbitrary number of instances of an arbitrary number of objects, with all objects being from one specified dataset (e.g. YCB-V). No prior information about the visible object instances is provided. The method can use default detections (results of GDRNPPDet_PBRReal, the best 2D detection method from 2022 for ModelBased-2DDet-Seen).
Test output: As in ModelBased-6DLoc-Seen.
Evaluation methodology: Similar to the evaluation methodology from the COCO 2020 Object Detection Challenge used for 2D detection, the 6D detection accuracy is measured by the Average Precision (AP). This metric is calculated by two following pose-error functions (see Section 2.2 in the BOP 2020 paper):
Used in the 2022 and 2023 challenges. Can be evaluated on BOP-Classic and BOP-H3 datasets.
Training input: At training time, a method is provided a set of training images showing objects that are annotated with ground-truth 2D bounding boxes. The boxes are amodal, i.e., covering the whole object silhouette including the occluded parts. The method can also use 3D mesh models that are available for the objects.
Test input: At test time, the method is given an RGB-D image unseen during training that shows an arbitrary number of instances of an arbitrary number of objects, with all objects being from one specified dataset (e.g. YCB-V). No prior information about the visible object instances is provided.
Test output: The method produces a list of object detections with confidences, with each detection defined by an amodal 2D bounding boxes.
Evaluation methodology: Following the evaluation methodology from the COCO 2020 Object Detection Challenge, the detection accuracy is measured by the Average Precision (AP). Specifically, a per-object $\text{AP}_O$ score is calculated by averaging the precision at multiple Intersection over Union (IoU) thresholds: $[0.5, 0.55, \dots , 0.95]$. The accuracy of a method on a dataset $D$ is measured by $\text{AP}_D$ calculated by averaging per-object $\text{AP}_O$ scores, and the overall accuracy on the core datasets is measured by $\text{AP}_C$ defined as the average of the per-dataset $\text{AP}_D$ scores. Correct predictions for annotated object instances that are visible from less than 10% (and not considered in the evaluation) are filtered out and not counted as false positives. Up to 100 predictions per image (with the highest confidence scores) are considered.
Used in the 2022 and 2023 challenges. Can be evaluated on BOP-Classic datasets.
Training input: At training time, a method is provided a set of training images showing objects that are annotated with ground-truth 2D binary masks. The masks are modal, i.e., covering only the visible object part. The method can also use 3D mesh models that are available for the objects.
Test input: At test time, the method is given an RGB-D image unseen during training that shows an arbitrary number of instances of an arbitrary number of objects, with all objects being from one specified dataset (e.g. YCB-V). No prior information about the visible object instances is provided.
Test output: The method produces a list of object segmentations with confidences, with each segmentation defined by a modal 2D binary mask.
Evaluation methodology: As in ModelBased-2DDet-Seen, with the only difference being that IoU is calculated on binary masks instead of bounding boxes.
Used in the 2023 challenge. Can be evaluated on BOP-Classic datasets.
Training input: At training time, a method is provided a set of RGB-D training images showing training objects annotated with ground-truth 6D poses, and 3D mesh models of the objects (typically with a color texture). The 6D object pose is defined as in ModelBased-6DLoc-Seen. The method can use 3D mesh models that are available for the training objects.
Object-onboarding input: The method is provided 3D mesh models of test objects that were not seen during training. To onboard each object (e.g. to render images/templates or fine-tune a neural network), the method can spend up to 5 minutes of the wall-clock time on a single computer with up to one GPU. The time is measured from the point right after the raw data (e.g. 3D mesh models) is loaded to the point when the object is onboarded. The method can use a subset of the BlenderProc images (links "PBR-BlenderProc4BOP training images") originally provided for Tasks 1–3 – the method can use as many images from this set as could be rendered within the limited onboarding time (consider that rendering of one image takes 2 seconds; rendering and any additional processing need to fit within 5 minutes). The method can also render custom images/templates but cannot use any real images of the object in the onboarding stage. The object representation (which may be given by a set of templates, an ML model, etc.) needs to be fixed after onboarding (it cannot be updated on test images).
Test input: At test time, the method is given an RGB-D image unseen during training and a list $L = [(o_1, n_1),$ $\dots,$ $(o_m, n_m)]$, where $n_i$ is the number of instances of a test object $o_i$ that are visible in the image. The method can use default detections/segmentations (results of CNOS, the best 2D detection method for ModelBased-2DDet-Unseen in 2023).
Test output: As in ModelBased-6DLoc-Seen.
Evaluation methodology: As in ModelBased-6DLoc-Seen.
Used in the 2024 challenge. Can be evaluated on BOP-Classic and BOP-H3 datasets.
Training input: As in ModelBased-6DLoc-Unseen.
Object-onboarding input: As in ModelBased-6DLoc-Unseen.
Test input: At test time, the method is given an RGB-D image unseen during training that shows an arbitrary number of instances of an arbitrary number of objects, with all objects being from one specified dataset (e.g. YCB-V). No prior information about the visible object instances is provided. The method can use default detections/segmentations (results of CNOS, the best 2D detection method for ModelBased-2DDet-Unseen in 2023).
Test output: As in ModelBased-6DLoc-Seen.
Evaluation methodology: As in ModelBased-6DDet-Seen.
Used in the 2023 challenge. Can be evaluated on BOP-Classic and BOP-H3 datasets.
Training input: At training time, a method is provided a set of RGB-D training images showing training objects that are annotated with ground-truth 2D bounding boxes. The boxes are amodal, i.e., covering the whole object silhouette including the occluded parts. The method can also use 3D mesh models that are available for the training objects.
Object-onboarding input: As in ModelBased-6DLoc-Unseen.
Test input: At test time, the method is given an RGB-D image unseen during training that shows an arbitrary number of instances of an arbitrary number of test objects, with all objects being from one specified dataset (e.g. YCB-V). No prior information about the visible object instances is provided.
Test output: As in ModelBased-2DDet-Seen.
Evaluation methodology: As in ModelBased-2DDet-Seen.
Used in the 2023 challenge. Can be evaluated on BOP-Classic datasets.
Training input: At training time, a method is provided a set of RGB-D training images showing training objects that are annotated with ground-truth 2D binary masks. The masks are modal, i.e., covering only the visible object part. The method can also use 3D mesh models that are available for the objects.
Object-onboarding input: As in ModelBased-6DLoc-Unseen.
Test input: As in ModelBased-2DDet-Unseen.
Test output: As in ModelBased-2DSeg-Seen.
Evaluation methodology: As in ModelBased-2DSeg-Seen.
Used in the 2024 challenge. Can be evaluated on BOP-H3 datasets.
Training input: As in ModelBased-6DLoc-Unseen.
Object-onboarding input: The method is provided reference video(s) of test objects that were not seen during training. 3D models of test objects are not available. The method can use only one of the two following types of reference videos:
Test input: At test time, the method is given an RGB-D image unseen during training that shows an arbitrary number of instances of an arbitrary number of test objects, with all objects being from one specified dataset (e.g. YCB-V). No prior information about the visible object instances is provided. The method can use default detections/segmentations (results of CNOS, the best 2D detection method for model-based unseen objects but adapted to model-free unseen objects by replacing templates rendered from CAD models with reference images from onboarding video(s)).
Test output: As in ModelBased-6DLoc-Seen.
Evaluation methodology: As in ModelBased-6DDet-Seen.
Used in the 2024 challenge. Can be evaluated on BOP-H3 datasets.
Training input: As in ModelBased-2DDet-Unseen.
Object-onboarding input: As in ModelFree-6DDet-Unseen.
Test input: As in ModelBased-2DDet-Unseen.
Test output: As in ModelBased-2DDet-Seen.
Evaluation methodology: As in ModelBased-2DDet-Seen.