Diagnostic performance of artificial intelligence models for pulmonary nodule classification: a multi-model evaluation

By
Sarah K. Herber
Lukas Müller
Daniel Pinto dos Santos
Tobias Jorg
Fabio Souschek
Tobias Bäuerle
Sebastian Foersch
Christian Galata
Peter Mildenberger
Moritz C. Halfmann
July 25, 2025
0 min

European Radiology

Overview

This study evaluated three commercial AI models for classifying pulmonary nodules as benign or malignant using histopathology as the gold standard. The models demonstrated varying thresholds and outputs for malignancy risk, highlighting differences in diagnostic performance and applicability to nodules of specific sizes.

Background

Lung cancer remains the leading cause of cancer mortality, largely due to late-stage diagnosis. Early detection of malignant pulmonary nodules via high-resolution CT is critical for improving prognosis. However, increased detection rates have reduced specificity, necessitating improved classification tools. AI models have emerged to automate nodule detection and malignancy risk prediction, potentially enhancing clinical decision-making and patient outcomes. Despite promise, clinical adoption is limited by concerns about generalizability, transparency, and impact on radiologist decisions.

Data Highlights

AI Model	Version	Malignancy Risk Output	Thresholds for Benign/Intermediate/Malignant	Nodule Size Range (mm)
Model 1 (ADVANCE Chest CT)	2.2.1	Continuous 0–100%	<19% benign, 19–62% intermediate, >62% malignant	4–30
Model 2 (InferRead® CT Lung)	1.0.1.1	Continuous 0.0–1.0	<0.1 benign, 0.1–0.9 intermediate, >0.9 malignant	3–30 (only 7 largest nodules scored)
Model 3 (Rayscape Lung CT)	2-1-174-1.278-2.153	Categorical: low, medium, high	No numerical thresholds provided	3–30

Key Findings

All three AI models automatically detected and classified pulmonary nodules between 4 and 30 mm in size using thoracic CT scans.
Model 1 provided a continuous malignancy probability score with defined thresholds, enabling stratification into benign, intermediate, and malignant categories.
Model 2 also produced a continuous malignancy score but limited malignancy risk calculation to the seven largest nodules per scan.
Model 3 offered only categorical malignancy risk outputs without continuous scoring or numerical thresholds.
AI models were primarily trained to detect primary lung cancers, with specific subgroup analyses addressing nodules between 5 and 8 mm and differentiation between primary malignancies and metastases.
Diagnostic accuracy was assessed using ROC curves and AUC values, with statistical significance set at p < 0.05.

Clinical Implications

The evaluated AI models demonstrate potential to assist radiologists in classifying pulmonary nodules, particularly by providing malignancy risk stratification that may guide follow-up and biopsy decisions. However, differences in output formats and thresholds necessitate careful interpretation within clinical workflows. Limitations such as restricted nodule size ranges and scoring constraints should be considered when integrating these tools into practice.

Conclusion

This comprehensive assessment highlights the diagnostic capabilities and limitations of current AI models for pulmonary nodule classification, underscoring the need for further validation and standardization to support clinical adoption and improve lung cancer early detection.

References

Lung Cancer Mortality and Early Detection Context
AI Models for Pulmonary Nodule Classification Evaluation Study

Diagnostic performance of artificial intelligence models for pulmonary nodule classification: a multi-model evaluation

Clinical Report: AI Models for Pulmonary Nodule Classification Accuracy

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

References

Original Source(s)

Diagnostic performance of artificial intelligence models for pulmonary nodule classification: a multi-model evaluation

Related Content

A CT radiomics nomogram predicts visual acuity improvement in patients with indirect traumatic optic neuropathy following optic canal decompression

A Nomogram for Predicting Low Bone Mineral Density in the Elderly Using Chest CT

Research progress in imaging detection of brain metastases