| Submission name | Flanders Make | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Submission time (UTC) | Oct. 1, 2025, 11:59 p.m. | ||||||||||||||
| User | luxmanramamoorthy | ||||||||||||||
| Task | Model-based 6D detection of unseen objects | ||||||||||||||
| Dataset | IPD | ||||||||||||||
| Description | |||||||||||||||
| Evaluation scores |
|
| User | luxmanramamoorthy |
|---|---|
| Publication | |
| Implementation | python |
| Training image modalities | RGB |
| Test image modalities | RGB-D |
| Description | Methodology: Instance Segmentation + Feature Matching for 6D Object Pose EstimationOur pipeline for 6D object pose estimation in the context of the BOP Challenge and BOP IPD dataset : Our approach combines instance segmentation, feature matching, and 3D correspondence mapping to estimate the full 6D pose of objects in unseen scenes. 1. OverviewThe core idea of our method is to leverage instance segmentation and robust feature matching to find correspondences between detected objects in a scene and reference objects with known poses. The pipeline consists of the following stages:
2. Detailed Pipeline2.1 Instance SegmentationWe train a Mask R-CNN model on the dataset to perform instance segmentation.
Output: segmented polygon masks for each detected object. 2.2 Mask ExtractionFrom the segmentation results, we extract individual object masks. 2.3 Feature MatchingFor each extracted mask:
2.4 Pixel-to-3D MappingWe use the depth image to map matched pixel locations into 3D coordinates in the camera coordinate frame. [ \mathbf{p} = K^{-1} \cdot (u, v, 1) \cdot d(u, v) ] Where: - (K) is the camera intrinsic matrix - ((u, v)) are pixel coordinates - (d(u, v)) is the depth value at that pixel Result: a set of 3D point correspondences: [ { (\mathbf{P}_\text{test}^i, \mathbf{P}_\text{ref}^i) } ] 2.5 Pose TransferGiven: - Known 6D pose of the reference mask ((\mathbf{R}_\text{ref}, \mathbf{t}_\text{ref})) - Matched 3D point correspondences We compute the rigid transformation ((\mathbf{R}, \mathbf{t})) that aligns the reference object to the detected object: [ \mathbf{R}, \mathbf{t} = \arg\min_{\mathbf{R}, \mathbf{t}} \sum_i \| \mathbf{P}_\text{test}^i - (\mathbf{R} \mathbf{P}_\text{ref}^i + \mathbf{t}) \|^2 ] This is typically solved using the Kabsch algorithm. Output: 3. Pipeline DiagramThe overall pipeline can be summarized as: Input RGB Image + Depth Image │ ▼ ┌────────────────────┐ │ 1. Instance │ │ Segmentation │ │ (Mask R-CNN) │ └────────────────────┘ │ ▼ ┌────────────────────┐ │ 2. Mask Extraction │ └────────────────────┘ │ ▼ ┌─────────────────────────────────────────────┐ │ 3. Feature Matching │ │ - Use SuperGlue to match mask features │ │ between detected object and reference │ │ masks with known poses │ └─────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────┐ │ 4. Pixel-to-3D Mapping │ │ - Map matched pixels to 3D points using │ │ depth images and camera intrinsics │ └─────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────┐ │ 5. Pose Transfer │ │ - Use 3D point correspondences and known │ │ reference pose to compute the full 6D │ │ pose of the detected object │ └─────────────────────────────────────────────┘ │ ▼ Output: Estimated 6D pose (Rotation + Translation)4. Advantages of the Approach
5. Limitations and Future Work
Future directions include: - Incorporating pose refinement networks to improve accuracy. - Using multi-view fusion for improved robustness. - Adding uncertainty estimation to quantify pose confidence. 6. SummaryOur approach introduces a modular pipeline for 6D pose estimation by combining instance segmentation, SuperGlue feature matching, and 3D correspondence mapping. |
| Computer specifications | Dell Pro Max T2 | Ubuntu 24.04 LTS Basis Dell Pro Max Tower T2 (FCT2250) CTO Base Processor Intel Core Ultra 7 265K (30 MB cache, 20 cores, 20 threads, 3.3 GHz to 5.5 GHz, 125W) Taalpakket voor besturingssysteem No Factory Install Language Software Systeembeheer Intel vPro Disabled Intel Prestaties/mogelijkheden iRST not selected Chassisopties Dell Pro Max Tower T2 with 1500W (80 Plus Platinum) PSU Geheugen 64GB: 2 x 32 GB, DDR5, 5600 MT/s, non-ECC Videokaart NVIDIA RTX 4000 ADA, 20 GB GDDR6, 4 DP Graphics Holder T2 Graphic Card Holder Thermische koeling No Fans Included Thermische koeling Premium CPU Air Cooler Bay 9 van storageapparaat 2TB SSD TLC with DRAM |