ESR Essentials: common performance metrics in AI—practice recommendations by the European Society of Medical Imaging Informatics

By
Michail E. Klontzas
Kevin B. W. Groot Lipman
Tugba Akinci D’ Antonoli
Anna Andreychenko
Renato Cuocolo
Matthias Dietzel
Salvatore Gitto
Henkjan Huisman
João Santinha
Federica Vernuccio
Jacob J. Visser
Merel Huisman
August 3, 2025
0 min

European Radiology

At a Glance

Category	Detail
Condition	Artificial intelligence (AI) tools in medical imaging
Key Mechanisms	Evaluation of AI performance using task-specific metrics including segmentation, detection, classification, calibration, uncertainty quantification, and explainability
Target Population	Patients undergoing medical imaging across diverse clinical settings and demographics
Care Setting	Radiology departments and clinical workflows integrating AI diagnostic tools

Key Highlights

Locally validate AI tools beyond CE-marking using independent datasets reflecting institutional protocols and patient demographics.
Use a combination of segmentation, test-based, and outcome-based performance metrics to comprehensively assess AI diagnostic accuracy.
Consider deployment context by engaging clinicians to define relevant metrics and assess performance across clinically meaningful subgroups.

Guideline-Based Recommendations

Diagnosis

Apply task-specific metrics such as Dice similarity coefficient for segmentation and sensitivity/specificity for classification tasks.
Assess AI performance at multiple levels: pixel, region, scan, and patient to capture clinical relevance.
Incorporate calibration metrics (e.g., Brier score) and uncertainty quantification (e.g., conformal prediction) to understand model trustworthiness.

Management

Engage radiologists and clinicians in metric selection and interpretation to align AI evaluation with clinical goals and workflows.
Avoid reliance on single metrics vulnerable to class imbalance; report both test-based and outcome-based metrics.
Use independent, institution-specific datasets for local validation to ensure AI performance matches claimed results.

Monitoring & Follow-up

Continuously assess AI performance across vulnerable or clinically meaningful subgroups to detect variability.
Monitor calibration and uncertainty metrics to prevent overconfidence in AI predictions.
Standardize reporting of performance metrics to facilitate transparent and reproducible AI evaluation.

Risks

Inappropriate use or interpretation of metrics can mislead users, leading to overdiagnosis and increased healthcare costs.
Lack of standardized metric reporting places burden on end-users, risking flawed assessments and unsafe AI integration.
Limited real-world implementation of calibration and uncertainty methods may result in overestimation of AI performance.

Patient & Prescribing Data

Patients undergoing diagnostic imaging across various institutions with diverse demographics and disease prevalence

AI tools must be locally validated and evaluated using comprehensive, clinically relevant metrics to ensure reliable diagnostic support and safe integration into patient care pathways.

Clinical Best Practices

Validate AI tools locally with datasets independent of development data reflecting local imaging protocols and patient demographics.
Use a combination of segmentation, test-based, and outcome-based metrics to capture different aspects of AI performance.
Engage multidisciplinary clinical teams to define relevant performance metrics and interpret results within the deployment context.
Incorporate calibration and uncertainty quantification metrics to assess prediction reliability and avoid overconfidence.
Report performance metrics transparently and standardize metric usage to support informed clinical decision-making.

ESR Essentials: common performance metrics in AI—practice recommendations by the European Society of Medical Imaging Informatics

Clinical Scorecard: Key Performance Indicators for AI in Medical Imaging: Practice Guidelines from the European Society of Medical Imaging Informatics

At a Glance

Key Highlights

Guideline-Based Recommendations

Diagnosis

Management

Monitoring & Follow-up

Risks

Patient & Prescribing Data

Clinical Best Practices

References

Original Source(s)

ESR Essentials: common performance metrics in AI—practice recommendations by the European Society of Medical Imaging Informatics

Related Content

Use of gadolinium-based contrast agents in head and neck cancer diagnosis, staging, and monitoring: current applications and future perspectives

Can MRI Predict Perforator Stroke Progression?

Global Cardiac Imaging Radiation Varies Widely