Self-Supervised Monocular Depth Estimation via Non-Lambertian Image Decomposition in SHADeS
Overview
This study introduces SHADeS, a novel self-supervised monocular depth estimation framework that improves robustness to specular reflections in endoscopic images by decomposing images into albedo, shading, and specular reflection components. The method outperforms state-of-the-art approaches on real and phantom colonoscopy datasets, enabling better depth estimation and implicit specularity segmentation.
Background
Colorectal cancer is a leading cause of cancer-related mortality worldwide, with early detection significantly improving survival rates. Colonoscopy is the primary diagnostic tool but suffers from visibility challenges due to reflections and lighting variations in the endoscopic environment. Monocular depth estimation is critical for 3D reconstruction and navigation during endoscopy, yet existing methods struggle with non-Lambertian surfaces typical of wet, reflective tissue. Self-supervised learning approaches have advanced the field but remain limited by specular highlights that degrade depth estimation accuracy.
Data Highlights
Method
Dataset
Performance
SHADeS
Hyper Kvasir (real)
Improved robustness to specular reflections
SHADeS
C3VD (phantom)
Superior depth estimation accuracy
IID-SfMLearner
Various
Albedo extraction with specular artefacts
Monodepth2, MonoViT
Endoscopy data
Suboptimal under specular conditions
Key Findings
SHADeS extends the Lambertian image decomposition model by adding a specular reflection component, enabling the relation I = AS + M.
The model jointly estimates depth, albedo, shading, and specular reflections, improving separation of true tissue color from reflections.
SHADeS effectively removes specular artefacts from albedo maps, unlike previous methods such as IID-SfMLearner.
The framework implicitly produces specularity segmentation masks and can generate inpainted images free of specular highlights.
SHADeS demonstrates superior monocular depth estimation performance on both real (Hyper Kvasir) and phantom (C3VD) colonoscopy datasets compared to SOTA methods.
Clinical Implications
Improved monocular depth estimation in endoscopy can enhance 3D reconstruction, aiding in the detection of missed lesions and better characterization of polyps. By effectively handling specular reflections, SHADeS can provide clearer visualization and more reliable navigation during colonoscopy. This advancement may contribute to higher early detection rates of colorectal cancer and improved training tools for endoscopists.
Conclusion
SHADeS represents a significant advancement in self-supervised monocular depth estimation for endoscopy by addressing the challenges posed by non-Lambertian surfaces and specular reflections. Its ability to jointly estimate depth and decompose image components enhances robustness and accuracy, with promising applications in colorectal cancer diagnosis and endoscopic navigation.
References
IID-SfMLearner (Zhou et al. 2017) -- Unsupervised Learning of Depth and Ego-Motion from Video
MonoViT (Zhang et al. 2023) -- Transformer-based Monocular Depth Estimation
Monodepth2 (Godard et al. 2019) -- Self-Supervised Monocular Depth Estimation with Improved Losses
Colorectal cancer (CRC) impacts about 1 in 13,000 pregnancies in the United States. This article follows the case of a 36-year-old woman diagnosed with CRC while pregnant and highlights considerations around CRC diagnosis and cancer treatment during pregnancy.