Clinical Report: Cross-Modal AI Framework for Breast Cancer Diagnosis
Overview
A novel AI framework integrating mammography and histopathology images significantly improves breast cancer diagnosis accuracy and robustness across institutions. The model achieves a mean AUC of 0.90 on multiple public datasets, outperforming existing unimodal and multimodal approaches while providing interpretable attention maps linking imaging modalities.
Background
Accurate breast cancer diagnosis often requires synthesizing information from multiple imaging modalities, such as mammography and histopathology. Traditional AI systems typically analyze single modalities and struggle with generalization across different clinical settings. Integrating cross-modal data with domain generalization techniques can enhance diagnostic performance and reliability. This study proposes a unified vision transformer-based framework that aligns mammographic and histopathological features to improve classification, lesion localization, and pathological grading.
Data Highlights
Dataset
Modality
Performance Metric
Result
CBIS-DDSM
Mammography + Histopathology
Mean AUC
0.90
INbreast
Mammography + Histopathology
Mean AUC
0.90
BACH
Histopathology
Mean AUC
0.90
CAMELYON16/17
Histopathology
Mean AUC
0.90
Domain Gap
Cross-Institutional
Gap Reduction
0.03 vs. 0.06–0.10
Key Findings
The proposed cross-modal framework uses a shared vision transformer encoder with modality-specific adapters to jointly analyze mammography and histopathology images.
Domain generalization strategies, including MixStyle augmentation and invariant risk minimization, reduce domain gaps and improve robustness across institutions.
Causal test-time adaptation further enhances model performance on unseen target domains.
The model simultaneously performs classification, lesion localization, and pathological grading, providing clinically relevant multi-task outputs.
Interpretability analyses demonstrate that attention maps generated by the model align suspicious mammographic regions with corresponding histopathological evidence, supporting clinical trust.
Clinical Implications
This integrated AI framework offers a more reliable and generalizable tool for breast cancer diagnosis by combining complementary imaging modalities. Its ability to provide interpretable outputs linking mammographic findings with histopathology can assist clinicians in diagnostic decision-making. The improved cross-institutional robustness suggests potential for broader clinical deployment and adoption.
Conclusion
By advancing multimodal integration, domain generalization, and explainability, this study presents a clinically promising AI system for breast cancer diagnosis that outperforms current unimodal and multimodal baselines. The framework's interpretability and robustness mark a significant step toward real-world clinical application.