Performance across different versions of an artificial intelligence model for screen-reading of mammograms - Report - MDSpire

Performance across different versions of an artificial intelligence model for screen-reading of mammograms

  • By

  • Marthe Larsen

  • Christoph I. Lee

  • Marie B. Bergan

  • Åsne S. Holen

  • Håkon Lund-Hanssen

  • Solveig R. Hoff

  • Steinar Auensen

  • Jan F. Nygård

  • Kristina Lång

  • Yan Chen

  • Giske Ursin

  • Solveig Hofvind

  • January 13, 2026

  • 0 min

Share

Evaluation of AI Model Version Updates for Mammogram Analysis in BreastScreen Norway

Overview

This study compared two versions of a commercial AI model for mammography interpretation in a large national screening program. The updated AI version showed changes in malignancy risk scoring that impacted screening performance metrics, including screen-detected and interval cancer rates. Differences in tumor characteristics and mammographic features were also analyzed relative to AI risk scores.

Background

Breast cancer screening with mammography is evolving with the integration of artificial intelligence (AI) to improve interpretation accuracy and outcomes. Several AI models have regulatory clearance and are used to support radiologists by triaging exams or supplementing double reading. However, challenges remain regarding the impact of AI software updates on screening performance, ethical and legal considerations, and implementation costs. Understanding how successive AI model versions affect clinical outcomes is critical for optimizing screening programs.

Data Highlights

ParameterVersion 1.7Version 2.1
Number of Screening Exams~117,709~117,709
AI Risk Score Categories1–7 low, 8–9 intermediate, 10 highSame categorization
Screen-Detected Cancer DefinitionInvasive cancer or DCIS after recallSame
Interval Cancer DefinitionInvasive cancer or DCIS within 24 months post-negative screenSame
AI Model UpdatesOriginal algorithmArchitectural changes, expanded training data, updated sampling

Key Findings

  • Version 2.1 of the AI model incorporated architectural algorithm changes and expanded training data from multiple vendors and centers.
  • AI malignancy risk scores from version 2.1 differed from version 1.7, potentially altering thresholds for suspicious findings.
  • Screen-detected and interval cancer rates varied when using AI scores from the two versions, indicating impact on screening outcomes.
  • Histopathological tumor characteristics and mammographic features showed differences in distribution relative to AI risk scores between versions.
  • Version 2.1 demonstrated improved sensitivity compared to humans and prior AI versions in detecting malignancies.

Clinical Implications

Clinicians should be aware that updates to AI mammography models can change malignancy risk scoring and influence screening performance metrics. Continuous validation and quality assurance are essential when implementing new AI software versions to maintain or improve cancer detection rates and minimize false positives. Understanding these changes aids in optimizing recall decisions and resource allocation in breast cancer screening programs.

Conclusion

This study highlights that version updates in commercial AI models for mammography can significantly impact interpretive performance and screening outcomes. Ongoing evaluation of AI software versions is crucial to ensure consistent and improved breast cancer detection in population-based screening.

References

  1. BreastScreen Norway Program Data and Ethics Approvals
  2. ScreenPoint Medical BV - Transpara AI Model Versions 1.7 and 2.1
  3. St. Gallen Molecular Subtype Classification

Original Source(s)

Related Content