Anatomy-Informed Visual Prompt Tuning Enhances Breast Cancer Imaging Analysis
Overview
This study introduces A-VPT, a novel anatomy-guided visual prompt tuning framework that integrates explicit anatomical priors into Vision Transformer models for breast cancer imaging. A-VPT demonstrates state-of-the-art performance in lesion classification and segmentation across mammography, ultrasound, and MRI datasets while using minimal tunable parameters.
Background
Breast cancer detection across different imaging modalities is challenging due to lesion heterogeneity and lack of cross-domain consistency. Vision Transformers (ViTs) with parameter-efficient fine-tuning have advanced model adaptation but often lack incorporation of domain-specific anatomical knowledge. Embedding anatomical priors into deep learning models may improve interpretability and generalization. This work proposes a method to integrate glandular, fatty, and ductal tissue information directly into the prompt space of ViTs to enhance cross-modal breast cancer analysis.
Data Highlights
Dataset
Modality
Task
Performance
Tunable Parameters (%)
INbreast
Mammography
Lesion Classification & Segmentation
State-of-the-art
<2%
BUSI
Ultrasound
Lesion Classification & Segmentation
State-of-the-art
<2%
Duke-Breast-MRI
MRI
Lesion Classification & Segmentation
State-of-the-art
<2%
Key Findings
A-VPT dynamically generates tissue-aware prompts guided by glandular, fatty, and ductal region embeddings within a frozen Vision Transformer backbone.
Hierarchical prompt-token interactions across transformer layers enhance anatomical semantic integration.
Cross-modal contrastive alignment harmonizes anatomical semantics among mammography, ultrasound, and MRI modalities.
A-VPT achieves state-of-the-art lesion classification and segmentation performance on three benchmark datasets using less than 2% of tunable parameters compared to full fine-tuning.
Embedding anatomical priors improves model efficiency, generalization, and interpretability bridging deep learning with human anatomical reasoning.
Clinical Implications
Incorporating explicit anatomical knowledge into AI models can enhance breast cancer detection accuracy across multiple imaging modalities while reducing computational resources. The interpretable attention maps aligned with anatomical structures may increase clinician trust and facilitate integration into diagnostic workflows. This approach supports robust multi-domain generalization, potentially improving early and reliable breast cancer diagnosis.
Conclusion
Anatomy-guided visual prompt tuning represents a promising strategy to improve cross-modal breast cancer imaging analysis by embedding domain-specific anatomical priors. This method advances both performance and interpretability while maintaining parameter efficiency.
References
Moreira et al. 2012 -- INbreast: toward a full-field digital mammographic database
Al-Dhabyani et al. 2020 -- Dataset of breast ultrasound images
Saha et al. 2021 -- Dynamic contrast-enhanced magnetic resonance images of breast cancer patients