BOP: Benchmark for 6D Object Pose Estimation

Method: Co-op (SAM6D, 1 Hypo, RGBD)

User	sp9103
Publication
Implementation
Training image modalities	RGB-D
Test image modalities	RGB-D
Description	Submitted to: BOP Challenge 2024 Training data: MegaPose-GSO and MegaPose-ShapeNetCore Onboarding data: 42 rendered templates Used 3D models: Default, CAD Notes: Our method consists of 3 steps: coarse estimation, pose refinement, and optional pose selection. We train a model for each step and use the same models for all datasets. For each detection, we extract top-k hypotheses from our coarse network, and each hypothesis is refined using the refinement network. In the case of (k > 1), the refined hypotheses are scored using our pose selection network, and the best one is considered the output. k is specified in the title. Our coarse estimator is based on local feature matching between the query image and multiple pre-rendered templates. We model the query and rendered images as aggregation of multiple patches. The coarse network finds the matchings between patch centers of input crop and rendered templates. From the 42 templates, top-k templates are selected and pose hypotheses are generated by RANSAC-PnP for RGB and MAGSAC++ [A] for RGB-D case. The pose refiner is an optical flow based method similar to GenFlow [B], but we do not follow the RAFT structure for faster inference to bypass the inner loop calculation. We model the flow estimation as a probabilistic regression of a Laplace distribution. We use CroCo [C] pretraining for our coarse estimator, pose refiner, and pose selection model. Note that the inputs to our neural networks are the rgb images only. In this submission, each coarse hypothesis is refined 1 or 5 times. For the number of refinement iterations, please see the submission name. The 2D detector used is specified in parentheses in the title, and it uses the FastSAM object proposals. [A] Barath et al.: MAGSAC++, a fast, reliable and accurate robust estimator, CVPR 2020 [B] Moon et al.: GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects, CVPR 2024 [C] Weinzaepfel et al.: CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow, ICCV 2023
Computer specifications	Intel(R) Core(TM) i9-14900K, RTX4090

Public submissions

Date	Submission name	Dataset
2024-09-13 05:15	refine iter 1, magsac++ / Intel(R) Core(TM) i9-14900K, RTX4090	HB
2024-09-13 05:15	refine iter 1, magsac++ / Intel(R) Core(TM) i9-14900K, RTX4090	IC-BIN
2024-09-13 05:16	refine iter 1, magsac++ / Intel(R) Core(TM) i9-14900K, RTX4090	ITODD
2024-09-13 05:16	refine iter 1, magsac++ / Intel(R) Core(TM) i9-14900K, RTX4090	LM-O
2024-09-13 05:18	refine iter 1, magsac++ / Intel(R) Core(TM) i9-14900K, RTX4090	T-LESS
2024-09-13 05:19	refine iter 1, magsac++ / Intel(R) Core(TM) i9-14900K, RTX4090	TUD-L
2024-09-13 05:20	refine iter 1, magsac++ / Intel(R) Core(TM) i9-14900K, RTX4090	YCB-V