Bridging radiology and pathology: domain-generalized cross-modal learning for clinical applications

By
Xiang Zhong
Zhuo Gu
Manimurugan Shanmuganathan
Meng Li
Hao Sun
Mingming Du
Qian Chen
Guoqin Jiang
February 16, 2026
0 min

Npj Digital Medicine

Overview

A novel AI framework integrating mammography and histopathology images significantly improves breast cancer diagnosis accuracy and robustness across institutions. The model achieves a mean AUC of 0.90 on multiple public datasets, outperforming existing unimodal and multimodal approaches while providing interpretable attention maps linking imaging modalities.

Background

Accurate breast cancer diagnosis often requires synthesizing information from multiple imaging modalities, such as mammography and histopathology. Traditional AI systems typically analyze single modalities and struggle with generalization across different clinical settings. Integrating cross-modal data with domain generalization techniques can enhance diagnostic performance and reliability. This study proposes a unified vision transformer-based framework that aligns mammographic and histopathological features to improve classification, lesion localization, and pathological grading.

Data Highlights

Dataset	Modality	Performance Metric	Result
CBIS-DDSM	Mammography + Histopathology	Mean AUC	0.90
INbreast	Mammography + Histopathology	Mean AUC	0.90
BACH	Histopathology	Mean AUC	0.90
CAMELYON16/17	Histopathology	Mean AUC	0.90
Domain Gap	Cross-Institutional	Gap Reduction	0.03 vs. 0.06–0.10

Key Findings

The proposed cross-modal framework uses a shared vision transformer encoder with modality-specific adapters to jointly analyze mammography and histopathology images.
Weakly supervised patient-level contrastive alignment enables learning cross-modal correspondences without requiring pixel-level annotations.
Domain generalization strategies, including MixStyle augmentation and invariant risk minimization, reduce domain gaps and improve robustness across institutions.
Causal test-time adaptation further enhances model performance on unseen target domains.
The model simultaneously performs classification, lesion localization, and pathological grading, providing clinically relevant multi-task outputs.
Interpretability analyses demonstrate that attention maps generated by the model align suspicious mammographic regions with corresponding histopathological evidence, supporting clinical trust.

Clinical Implications

This integrated AI framework offers a more reliable and generalizable tool for breast cancer diagnosis by combining complementary imaging modalities. Its ability to provide interpretable outputs linking mammographic findings with histopathology can assist clinicians in diagnostic decision-making. The improved cross-institutional robustness suggests potential for broader clinical deployment and adoption.

Conclusion

By advancing multimodal integration, domain generalization, and explainability, this study presents a clinically promising AI system for breast cancer diagnosis that outperforms current unimodal and multimodal baselines. The framework's interpretability and robustness mark a significant step toward real-world clinical application.

Bridging radiology and pathology: domain-generalized cross-modal learning for clinical applications

Clinical Report: Cross-Modal AI Framework for Breast Cancer Diagnosis

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

References

Original Source(s)

Bridging radiology and pathology: domain-generalized cross-modal learning for clinical applications

Related Content

25HC regulates the polarization of CD163+ macrophages in the immune microenvironment of triple-negative breast cancer through the interferon pathway

Noninvasive Evaluation of Ki-67 Overexpression in Breast Cancer Using Ultrasound Radiomics and Habitat Analysis

Clinical implications of 10-formyltetrahydrofolate dehydrogenase expression in hormone receptor-positive breast cancer