BOP: Benchmark for 6D Object Pose Estimation

Submission: NOVASplat (FastSAM)/HOPEv2/NOVASplatv_hope

Download submission

Submission name

NOVASplatv_hope

Submission time (UTC)

Oct. 1, 2025, 5:37 p.m.

User

orestisvaggelis

Task

Model-free 2D detection of unseen objects

Dataset

HOPEv2

Description

Evaluation scores

AP:	0.399
AP50:	0.631
AP75:	0.418
AP_large:	0.465
AP_medium:	0.058
AP_small:	0.089
AR1:	0.442
AR10:	0.494
AR100:	0.494
AR_large:	0.568
AR_medium:	0.112
AR_small:	0.089
average_time_per_image:	0.335

Method: NOVASplat (FastSAM)

User	orestisvaggelis
Publication	Not published
Implementation	Will be made public upon publication
Training image modalities	RGB
Test image modalities	RGB
Description	Submitted to: BOP Challenge 2025 Training data: None Onboarding data: The method performs model-free onboarding using the static onboarding sequences. Notes: Onboarding: The known camera poses and the estimated object poses relative to each camera are used to align the two onboarding sequences (upward and downward views) into a common coordinate frame. With these poses, the method performs feature detection on the masked object regions (SIFT), sequential feature matching, and point triangulation to obtain a sparse SfM point cloud of the object. The resulting camera poses and sparse point cloud serve as input for training a 3D Gaussian Splatting model for 14k iterations. Once trained, the model is used to render 642 template images from camera viewpoints distributed on an icosphere around the object. The entire onboarding stage requires, on average, 4 minutes and 30 seconds. Inference: For inference, the method adopts a variant of the CNOS proposal stage, using Fast-SAM to generate object proposals. DINOv3 features are then extracted from these proposals, and the matching stage follows the CNOS pipeline closely.
Computer specifications	Workstation with 28 CPU cores and NVIDIA RTX4090