Diagnostic performance of artificial intelligence models for pulmonary nodule classification: a multi-model evaluation - Report - MDSpire

Diagnostic performance of artificial intelligence models for pulmonary nodule classification: a multi-model evaluation

  • By

  • Sarah K. Herber

  • Lukas Müller

  • Daniel Pinto dos Santos

  • Tobias Jorg

  • Fabio Souschek

  • Tobias Bäuerle

  • Sebastian Foersch

  • Christian Galata

  • Peter Mildenberger

  • Moritz C. Halfmann

  • July 25, 2025

  • 0 min

Share

Clinical Report: AI Models for Pulmonary Nodule Classification Accuracy

Overview

This study evaluated three commercial AI models for classifying pulmonary nodules as benign or malignant using histopathology as the gold standard. The models demonstrated varying thresholds and outputs for malignancy risk, highlighting differences in diagnostic performance and applicability to nodules of specific sizes.

Background

Lung cancer remains the leading cause of cancer mortality, largely due to late-stage diagnosis. Early detection of malignant pulmonary nodules via high-resolution CT is critical for improving prognosis. However, increased detection rates have reduced specificity, necessitating improved classification tools. AI models have emerged to automate nodule detection and malignancy risk prediction, potentially enhancing clinical decision-making and patient outcomes. Despite promise, clinical adoption is limited by concerns about generalizability, transparency, and impact on radiologist decisions.

Data Highlights

AI ModelVersionMalignancy Risk OutputThresholds for Benign/Intermediate/MalignantNodule Size Range (mm)
Model 1 (ADVANCE Chest CT)2.2.1Continuous 0–100%<19% benign, 19–62% intermediate, >62% malignant4–30
Model 2 (InferRead® CT Lung)1.0.1.1Continuous 0.0–1.0<0.1 benign, 0.1–0.9 intermediate, >0.9 malignant3–30 (only 7 largest nodules scored)
Model 3 (Rayscape Lung CT)2-1-174-1.278-2.153Categorical: low, medium, highNo numerical thresholds provided3–30

Key Findings

  • All three AI models automatically detected and classified pulmonary nodules between 4 and 30 mm in size using thoracic CT scans.
  • Model 1 provided a continuous malignancy probability score with defined thresholds, enabling stratification into benign, intermediate, and malignant categories.
  • Model 2 also produced a continuous malignancy score but limited malignancy risk calculation to the seven largest nodules per scan.
  • Model 3 offered only categorical malignancy risk outputs without continuous scoring or numerical thresholds.
  • AI models were primarily trained to detect primary lung cancers, with specific subgroup analyses addressing nodules between 5 and 8 mm and differentiation between primary malignancies and metastases.
  • Diagnostic accuracy was assessed using ROC curves and AUC values, with statistical significance set at p < 0.05.

Clinical Implications

The evaluated AI models demonstrate potential to assist radiologists in classifying pulmonary nodules, particularly by providing malignancy risk stratification that may guide follow-up and biopsy decisions. However, differences in output formats and thresholds necessitate careful interpretation within clinical workflows. Limitations such as restricted nodule size ranges and scoring constraints should be considered when integrating these tools into practice.

Conclusion

This comprehensive assessment highlights the diagnostic capabilities and limitations of current AI models for pulmonary nodule classification, underscoring the need for further validation and standardization to support clinical adoption and improve lung cancer early detection.

References

  1. Lung Cancer Mortality and Early Detection Context
  2. AI Models for Pulmonary Nodule Classification Evaluation Study

Original Source(s)

Related Content