ESR Essentials: common performance metrics in AI—practice recommendations by the European Society of Medical Imaging Informatics - Scorecard - MDSpire

ESR Essentials: common performance metrics in AI—practice recommendations by the European Society of Medical Imaging Informatics

  • By

  • Michail E. Klontzas

  • Kevin B. W. Groot Lipman

  • Tugba Akinci D’ Antonoli

  • Anna Andreychenko

  • Renato Cuocolo

  • Matthias Dietzel

  • Salvatore Gitto

  • Henkjan Huisman

  • João Santinha

  • Federica Vernuccio

  • Jacob J. Visser

  • Merel Huisman

  • August 3, 2025

  • 0 min

Share

Clinical Scorecard: Key Performance Indicators for AI in Medical Imaging: Practice Guidelines from the European Society of Medical Imaging Informatics

At a Glance

CategoryDetail
ConditionArtificial intelligence (AI) tools in medical imaging
Key MechanismsEvaluation of AI performance using task-specific metrics including segmentation, detection, classification, calibration, uncertainty quantification, and explainability
Target PopulationPatients undergoing medical imaging across diverse clinical settings and demographics
Care SettingRadiology departments and clinical workflows integrating AI diagnostic tools

Key Highlights

  • Locally validate AI tools beyond CE-marking using independent datasets reflecting institutional protocols and patient demographics.
  • Use a combination of segmentation, test-based, and outcome-based performance metrics to comprehensively assess AI diagnostic accuracy.
  • Consider deployment context by engaging clinicians to define relevant metrics and assess performance across clinically meaningful subgroups.

Guideline-Based Recommendations

Diagnosis

  • Apply task-specific metrics such as Dice similarity coefficient for segmentation and sensitivity/specificity for classification tasks.
  • Assess AI performance at multiple levels: pixel, region, scan, and patient to capture clinical relevance.
  • Incorporate calibration metrics (e.g., Brier score) and uncertainty quantification (e.g., conformal prediction) to understand model trustworthiness.

Management

  • Engage radiologists and clinicians in metric selection and interpretation to align AI evaluation with clinical goals and workflows.
  • Avoid reliance on single metrics vulnerable to class imbalance; report both test-based and outcome-based metrics.
  • Use independent, institution-specific datasets for local validation to ensure AI performance matches claimed results.

Monitoring & Follow-up

  • Continuously assess AI performance across vulnerable or clinically meaningful subgroups to detect variability.
  • Monitor calibration and uncertainty metrics to prevent overconfidence in AI predictions.
  • Standardize reporting of performance metrics to facilitate transparent and reproducible AI evaluation.

Risks

  • Inappropriate use or interpretation of metrics can mislead users, leading to overdiagnosis and increased healthcare costs.
  • Lack of standardized metric reporting places burden on end-users, risking flawed assessments and unsafe AI integration.
  • Limited real-world implementation of calibration and uncertainty methods may result in overestimation of AI performance.

Patient & Prescribing Data

Patients undergoing diagnostic imaging across various institutions with diverse demographics and disease prevalence

AI tools must be locally validated and evaluated using comprehensive, clinically relevant metrics to ensure reliable diagnostic support and safe integration into patient care pathways.

Clinical Best Practices

  • Validate AI tools locally with datasets independent of development data reflecting local imaging protocols and patient demographics.
  • Use a combination of segmentation, test-based, and outcome-based metrics to capture different aspects of AI performance.
  • Engage multidisciplinary clinical teams to define relevant performance metrics and interpret results within the deployment context.
  • Incorporate calibration and uncertainty quantification metrics to assess prediction reliability and avoid overconfidence.
  • Report performance metrics transparently and standardize metric usage to support informed clinical decision-making.

References

Original Source(s)

Related Content