Clinical Report: AI Models for Pulmonary Nodule Classification Accuracy
Overview
This study evaluated three commercial AI models for classifying pulmonary nodules as benign or malignant using histopathology as the gold standard. The models demonstrated varying thresholds and outputs for malignancy risk, highlighting differences in diagnostic performance and applicability to nodules of specific sizes.
Background
Lung cancer remains the leading cause of cancer mortality, largely due to late-stage diagnosis. Early detection of malignant pulmonary nodules via high-resolution CT is critical for improving prognosis. However, increased detection rates have reduced specificity, necessitating improved classification tools. AI models have emerged to automate nodule detection and malignancy risk prediction, potentially enhancing clinical decision-making and patient outcomes. Despite promise, clinical adoption is limited by concerns about generalizability, transparency, and impact on radiologist decisions.
Data Highlights
AI Model
Version
Malignancy Risk Output
Thresholds for Benign/Intermediate/Malignant
Nodule Size Range (mm)
Model 1 (ADVANCE Chest CT)
2.2.1
Continuous 0–100%
<19% benign, 19–62% intermediate, >62% malignant
4–30
Model 2 (InferRead® CT Lung)
1.0.1.1
Continuous 0.0–1.0
<0.1 benign, 0.1–0.9 intermediate, >0.9 malignant
3–30 (only 7 largest nodules scored)
Model 3 (Rayscape Lung CT)
2-1-174-1.278-2.153
Categorical: low, medium, high
No numerical thresholds provided
3–30
Key Findings
All three AI models automatically detected and classified pulmonary nodules between 4 and 30 mm in size using thoracic CT scans.
Model 1 provided a continuous malignancy probability score with defined thresholds, enabling stratification into benign, intermediate, and malignant categories.
Model 2 also produced a continuous malignancy score but limited malignancy risk calculation to the seven largest nodules per scan.
Model 3 offered only categorical malignancy risk outputs without continuous scoring or numerical thresholds.
AI models were primarily trained to detect primary lung cancers, with specific subgroup analyses addressing nodules between 5 and 8 mm and differentiation between primary malignancies and metastases.
Diagnostic accuracy was assessed using ROC curves and AUC values, with statistical significance set at p < 0.05.
Clinical Implications
The evaluated AI models demonstrate potential to assist radiologists in classifying pulmonary nodules, particularly by providing malignancy risk stratification that may guide follow-up and biopsy decisions. However, differences in output formats and thresholds necessitate careful interpretation within clinical workflows. Limitations such as restricted nodule size ranges and scoring constraints should be considered when integrating these tools into practice.
Conclusion
This comprehensive assessment highlights the diagnostic capabilities and limitations of current AI models for pulmonary nodule classification, underscoring the need for further validation and standardization to support clinical adoption and improve lung cancer early detection.
References
Lung Cancer Mortality and Early Detection Context
AI Models for Pulmonary Nodule Classification Evaluation Study
by Sarah K. Herber, Lukas Müller, Daniel Pinto dos Santos, Tobias Jorg, Fabio Souschek, Tobias Bäuerle, Sebastian Foersch, Christian Galata, Peter Mildenberger, Moritz C. Halfmann