Anatomy-guided visual prompt tuning for cross-modal breast cancer understanding - Report - MDSpire

Anatomy-guided visual prompt tuning for cross-modal breast cancer understanding

  • By

  • Shaorong Zhao

  • Qingxiang Meng

  • Yang He

  • Xiaotong Xu

  • Jiayao Zhu

  • Jiawen Qiu

  • Chao Wu

  • Yamei Han

  • Jinhai Deng

  • Teng Pan

  • Jingjing Liu

  • February 13, 2026

  • 0 min

Share

Anatomy-Informed Visual Prompt Tuning Enhances Breast Cancer Imaging Analysis

Overview

This study introduces A-VPT, a novel anatomy-guided visual prompt tuning framework that integrates explicit anatomical priors into Vision Transformer models for breast cancer imaging. A-VPT demonstrates state-of-the-art performance in lesion classification and segmentation across mammography, ultrasound, and MRI datasets while using minimal tunable parameters.

Background

Breast cancer detection across different imaging modalities is challenging due to lesion heterogeneity and lack of cross-domain consistency. Vision Transformers (ViTs) with parameter-efficient fine-tuning have advanced model adaptation but often lack incorporation of domain-specific anatomical knowledge. Embedding anatomical priors into deep learning models may improve interpretability and generalization. This work proposes a method to integrate glandular, fatty, and ductal tissue information directly into the prompt space of ViTs to enhance cross-modal breast cancer analysis.

Data Highlights

DatasetModalityTaskPerformanceTunable Parameters (%)
INbreastMammographyLesion Classification & SegmentationState-of-the-art<2%
BUSIUltrasoundLesion Classification & SegmentationState-of-the-art<2%
Duke-Breast-MRIMRILesion Classification & SegmentationState-of-the-art<2%

Key Findings

  • A-VPT dynamically generates tissue-aware prompts guided by glandular, fatty, and ductal region embeddings within a frozen Vision Transformer backbone.
  • Hierarchical prompt-token interactions across transformer layers enhance anatomical semantic integration.
  • Cross-modal contrastive alignment harmonizes anatomical semantics among mammography, ultrasound, and MRI modalities.
  • A-VPT achieves state-of-the-art lesion classification and segmentation performance on three benchmark datasets using less than 2% of tunable parameters compared to full fine-tuning.
  • Qualitative analyses reveal interpretable attention patterns consistent with radiological anatomical structures.
  • Embedding anatomical priors improves model efficiency, generalization, and interpretability bridging deep learning with human anatomical reasoning.

Clinical Implications

Incorporating explicit anatomical knowledge into AI models can enhance breast cancer detection accuracy across multiple imaging modalities while reducing computational resources. The interpretable attention maps aligned with anatomical structures may increase clinician trust and facilitate integration into diagnostic workflows. This approach supports robust multi-domain generalization, potentially improving early and reliable breast cancer diagnosis.

Conclusion

Anatomy-guided visual prompt tuning represents a promising strategy to improve cross-modal breast cancer imaging analysis by embedding domain-specific anatomical priors. This method advances both performance and interpretability while maintaining parameter efficiency.

References

  1. Moreira et al. 2012 -- INbreast: toward a full-field digital mammographic database
  2. Al-Dhabyani et al. 2020 -- Dataset of breast ultrasound images
  3. Saha et al. 2021 -- Dynamic contrast-enhanced magnetic resonance images of breast cancer patients
  4. Dosovitskiy et al. 2021 -- An image is worth 16x16 words: transformers for image recognition at scale
  5. Jia et al. 2022 -- Visual prompt tuning

Original Source(s)

Related Content