Multi-modal monocular endoscopic depth and pose estimation with edge-guided self-supervision - Report - MDSpire

Multi-modal monocular endoscopic depth and pose estimation with edge-guided self-supervision

  • By

  • Xinwei Ju

  • Rema Daher

  • Danail Stoyanov

  • Sophia Bano

  • Francisco Vasconcelos

  • May 10, 2026

Share

Clinical Report: Comprehensive Monocular Endoscopic Depth and Pose Assessment

Overview

This report introduces PRISM, a self-supervised framework that enhances depth and pose estimation in endoscopy by integrating luminance and edge cues. The model demonstrates improved robustness to illumination challenges and achieves state-of-the-art depth estimation accuracy.

Background

Gastrointestinal endoscopy is crucial for early cancer detection and treatment, yet it faces challenges such as blind spots and operator variability. Enhancing depth and pose estimation through computer-assisted navigation can significantly improve lesion detection and overall examination quality. The integration of self-supervised learning techniques offers a promising approach to address these challenges.

Data Highlights

PRISM achieves state-of-the-art depth estimation and comparable pose accuracy on phantom data, demonstrating improved robustness to illumination and sharper depth contrast around fold edges in real data.

Key Findings

  • PRISM integrates luminance cues into DepthNet and edge cues into PoseNet, improving geometric learning in endoscopy.
  • A stage-wise training strategy enhances pose accuracy without degrading depth quality.
  • Training on real-world data yields better generalization than synthetic data.
  • Optimal temporal sampling varies significantly across datasets and models.
  • Edge maps provide clearer structural boundaries for motion estimation.

Clinical Implications

The PRISM framework can enhance the accuracy of depth and pose estimation in endoscopic procedures, potentially leading to better detection rates of lesions. Its application may support adherence to updated clinical standards for colonoscopy performance.

Conclusion

PRISM represents a significant advancement in monocular endoscopic imaging, offering a structured approach to improve depth and pose estimation under challenging conditions. Its implementation could enhance clinical outcomes in gastrointestinal endoscopy.

References

  1. Author(s)/Org, Source, Year -- Title
  2. Author(s)/Org, Source, Year -- Title
  3. Author(s)/Org, Source, Year -- Title
  4. Author(s)/Org, Source, Year -- Title
  5. ASGE/ACG, ASGE, 2024 -- Clinical Quality Improvement for Colonoscopy
  6. Improving Colorectal Cancer Detection with AI-Assisted Colonoscopy, PubMed, 2025
  7. A prospective multicenter randomized controlled trial on artificial intelligence assisted colonoscopy, Scientific Reports, 2024
  8. https://www.asge.org/docs/default-source/default-document-library/asge-acg-qi-for-colonoscopy-faq_oct24.pdf?sfvrsn=dff3665f_4
  9. Improving Colorectal Cancer Detection with AI-Assisted Colonoscopy: A Systematic Review and Meta-Analysis of 38 RCTs with GRADE Assessment - PubMed
  10. A prospective multicenter randomized controlled trial on artificial intelligence assisted colonoscopy for enhanced polyp detection | Scientific Reports

Original Source(s)

Related Content