Random features meet MIL: a deep GP approach to colorectal MSI prediction - Report - MDSpire

Random features meet MIL: a deep GP approach to colorectal MSI prediction

  • By

  • Shixuan Shen

  • Zeyang Wang

  • Tianmu Liu

  • Kangle Ma

  • Zhen Tian

  • Fuqiang Zhang

  • Qingyue Zhang

  • December 15, 2025

  • 0 min

Share

Integrating Deep Gaussian Processes with Random Features for Colorectal Cancer MSI Prediction

Overview

This study presents a novel colorectal cancer classification model combining deep Gaussian processes with random feature expansion and multi-instance learning. The proposed approach achieves superior accuracy (AUC 0.895) compared to established CNN models, enhancing robustness and interpretability in weakly supervised settings.

Background

Colorectal cancer (CRC) is a leading cause of cancer mortality worldwide, with early diagnosis critical for improving outcomes. Traditional deep learning models for CRC classification face challenges due to data heterogeneity, weak supervision, and limited instance-level annotations. Multi-instance learning (MIL) frameworks address weak supervision by using bag-level labels but often lack robustness and interpretability. Integrating deep Gaussian processes (DGP) with MIL and attention mechanisms offers a promising solution to improve classification performance and clinical applicability.

Data Highlights

ModelAUC
Proposed DGP-RF with MIL0.895
ResNet0.777
EfficientNet0.791
ShuffleNet0.784

Key Findings

  • The proposed model integrates deep Gaussian processes with random feature expansion to better capture complex, non-linear tissue features.
  • Multi-instance learning enables effective use of weakly labeled whole-slide images by focusing on bag-level annotations.
  • An attention-based aggregation mechanism highlights key regions within slides, improving interpretability and robustness.
  • The model achieves an AUC of 0.895 on the TCGA-CRC dataset, outperforming ResNet, EfficientNet, and ShuffleNet.
  • The approach scales efficiently to large datasets and mitigates the need for costly instance-level labeling.

Clinical Implications

This method offers a scalable and interpretable tool for automated colorectal cancer detection from histopathological images, potentially aiding pathologists in early diagnosis. Its ability to focus on critical tissue regions enhances clinical trust and may facilitate integration into diagnostic workflows. The reduced reliance on detailed annotations can accelerate deployment in real-world settings.

Conclusion

Integrating deep Gaussian processes with random feature expansion and multi-instance learning significantly improves colorectal cancer classification accuracy and interpretability. This approach represents a promising advancement toward clinically deployable automated cancer detection systems.

References

  1. TCGA-CRC Dataset -- The Cancer Genome Atlas Colorectal Cancer Data
  2. ResNet Model -- He et al., 2016
  3. EfficientNet Model -- Tan and Le, 2019
  4. ShuffleNet Model -- Zhang et al., 2018

Original Source(s)

Related Content