Bridging radiology and pathology: domain-generalized cross-modal learning for clinical applications - Report - MDSpire

Bridging radiology and pathology: domain-generalized cross-modal learning for clinical applications

  • By

  • Xiang Zhong

  • Zhuo Gu

  • Manimurugan Shanmuganathan

  • Meng Li

  • Hao Sun

  • Mingming Du

  • Qian Chen

  • Guoqin Jiang

  • February 16, 2026

  • 0 min

Share

Clinical Report: Cross-Modal AI Framework for Breast Cancer Diagnosis

Overview

A novel AI framework integrating mammography and histopathology images significantly improves breast cancer diagnosis accuracy and robustness across institutions. The model achieves a mean AUC of 0.90 on multiple public datasets, outperforming existing unimodal and multimodal approaches while providing interpretable attention maps linking imaging modalities.

Background

Accurate breast cancer diagnosis often requires synthesizing information from multiple imaging modalities, such as mammography and histopathology. Traditional AI systems typically analyze single modalities and struggle with generalization across different clinical settings. Integrating cross-modal data with domain generalization techniques can enhance diagnostic performance and reliability. This study proposes a unified vision transformer-based framework that aligns mammographic and histopathological features to improve classification, lesion localization, and pathological grading.

Data Highlights

DatasetModalityPerformance MetricResult
CBIS-DDSMMammography + HistopathologyMean AUC0.90
INbreastMammography + HistopathologyMean AUC0.90
BACHHistopathologyMean AUC0.90
CAMELYON16/17HistopathologyMean AUC0.90
Domain GapCross-InstitutionalGap Reduction0.03 vs. 0.06–0.10

Key Findings

  • The proposed cross-modal framework uses a shared vision transformer encoder with modality-specific adapters to jointly analyze mammography and histopathology images.
  • Weakly supervised patient-level contrastive alignment enables learning cross-modal correspondences without requiring pixel-level annotations.
  • Domain generalization strategies, including MixStyle augmentation and invariant risk minimization, reduce domain gaps and improve robustness across institutions.
  • Causal test-time adaptation further enhances model performance on unseen target domains.
  • The model simultaneously performs classification, lesion localization, and pathological grading, providing clinically relevant multi-task outputs.
  • Interpretability analyses demonstrate that attention maps generated by the model align suspicious mammographic regions with corresponding histopathological evidence, supporting clinical trust.

Clinical Implications

This integrated AI framework offers a more reliable and generalizable tool for breast cancer diagnosis by combining complementary imaging modalities. Its ability to provide interpretable outputs linking mammographic findings with histopathology can assist clinicians in diagnostic decision-making. The improved cross-institutional robustness suggests potential for broader clinical deployment and adoption.

Conclusion

By advancing multimodal integration, domain generalization, and explainability, this study presents a clinically promising AI system for breast cancer diagnosis that outperforms current unimodal and multimodal baselines. The framework's interpretability and robustness mark a significant step toward real-world clinical application.

References

  1. Lee et al. 2017 -- A curated mammography data set for use in computer-aided detection and diagnosis research
  2. Moreira et al. 2012 -- INbreast: toward a full-field digital mammographic database
  3. Aresta et al. 2019 -- BACH: Grand challenge on breast cancer histology images
  4. Litjens et al. 2018 -- CAMELYON dataset of breast cancer sentinel lymph node sections

Original Source(s)

Related Content