Multi-modal monocular endoscopic depth and pose estimation with edge-guided self-supervision

By
Xinwei Ju
Rema Daher
Danail Stoyanov
Sophia Bano
Francisco Vasconcelos

May 10, 2026

Int. Journal Of Computer Assisted Radiology And Surgery

Overview

This report introduces PRISM, a self-supervised framework that enhances depth and pose estimation in endoscopy by integrating luminance and edge cues. The model demonstrates improved robustness to illumination challenges and achieves state-of-the-art depth estimation accuracy.

Background

Gastrointestinal endoscopy is crucial for early cancer detection and treatment, yet it faces challenges such as blind spots and operator variability. Enhancing depth and pose estimation through computer-assisted navigation can significantly improve lesion detection and overall examination quality. The integration of self-supervised learning techniques offers a promising approach to address these challenges.

Data Highlights

PRISM achieves state-of-the-art depth estimation and comparable pose accuracy on phantom data, demonstrating improved robustness to illumination and sharper depth contrast around fold edges in real data.

Key Findings

PRISM integrates luminance cues into DepthNet and edge cues into PoseNet, improving geometric learning in endoscopy.
A stage-wise training strategy enhances pose accuracy without degrading depth quality.
Training on real-world data yields better generalization than synthetic data.
Optimal temporal sampling varies significantly across datasets and models.
Edge maps provide clearer structural boundaries for motion estimation.

Clinical Implications

The PRISM framework can enhance the accuracy of depth and pose estimation in endoscopic procedures, potentially leading to better detection rates of lesions. Its application may support adherence to updated clinical standards for colonoscopy performance.

Conclusion

PRISM represents a significant advancement in monocular endoscopic imaging, offering a structured approach to improve depth and pose estimation under challenging conditions. Its implementation could enhance clinical outcomes in gastrointestinal endoscopy.