Diagnostic accuracy of deep learning vs. human raters for detecting osteoporotic vertebral compression fractures in routine CT scans - Report - MDSpire
Advertisement
Diagnostic accuracy of deep learning vs. human raters for detecting osteoporotic vertebral compression fractures in routine CT scans
Comparative Diagnostic Performance of Deep Learning and Humans in Osteoporotic Vertebral Fracture Detection
Overview
This study compared the diagnostic accuracy of four deep learning models, one commercial DL algorithm, and eight human raters in identifying osteoporotic vertebral compression fractures on routine CT scans. Using a large dataset of 3548 vertebrae from 331 patients, the DL models demonstrated competitive performance relative to human evaluators across multiple fracture severity levels and spinal regions.
Background
Osteoporosis leads to fragile bones and increased risk of vertebral compression fractures, which significantly impact morbidity and mortality. Early and accurate detection of these fractures is critical for timely treatment initiation. While CT imaging is valuable for assessing bone quality, diagnosing mild osteoporotic fractures remains challenging due to overlapping degenerative changes and anatomical variations. Deep learning algorithms have emerged as promising tools to enhance fracture detection by analyzing complex imaging patterns beyond human capability.
Data Highlights
Evaluator Type
Number of Evaluators/Models
Dataset Vertebrae
Fracture Prevalence (%)
CT Acquisition Details
Deep Learning Models
4 in-house + 1 commercial
3548 vertebrae (331 patients)
10.6% any fracture; 9.1% moderate/severe (Genant 2 or 3)
120 kVp, slice thickness 0.9-1.5 mm, bone kernel reconstruction
Human Raters
8 (students, residents, attendings)
Same as above
Same as above
Same as above
Key Findings
Deep learning models were trained on large, heterogeneous CT datasets with multi-scanner environments and diverse acquisition parameters, ensuring robustness.
Evaluation used the independent VerSe 19 & 20 datasets, containing routine clinical CT scans with a broad spectrum of spinal pathologies and anatomical variations.
DL algorithms and human raters were assessed on fracture detection at patient level, single vertebra level, and by spinal region (upper thoracic, lower thoracic, lumbar).
DL models showed comparable or superior accuracy to human evaluators in detecting any fracture (Genant 1–3) and clinically relevant moderate/severe fractures (Genant 2 or 3).
Region-specific analysis accounted for varying fracture prevalence and demonstrated consistent DL performance across spinal regions.
Inclusion of degenerative changes and other osseous alterations in the test set challenged both DL and human raters, highlighting the clinical relevance of the evaluation.
Clinical Implications
Deep learning algorithms can serve as effective adjuncts to human readers in routine CT imaging for osteoporotic vertebral fracture detection, potentially improving diagnostic consistency and early identification. Their ability to analyze subtle imaging features may reduce missed fractures, especially mild ones, facilitating timely intervention. Integration of DL tools into clinical workflows could enhance reporting accuracy and patient management.
Conclusion
The study demonstrates that deep learning models achieve diagnostic performance comparable to experienced human evaluators in identifying osteoporotic vertebral compression fractures on routine CT scans. These findings support the clinical utility of DL algorithms as complementary tools to improve fracture detection and patient outcomes.
References
Kaltenbach et al. 2024 -- Comparative Diagnostic Performance of Deep Learning Algorithms and Human Evaluators in Identifying Osteoporotic Vertebral Compression Fractures on Routine CT Imaging
by Evamaria O. Riedel, David Schinz, Matthias Keicher, Sebastian Rühling, Malek El Husseini, Chantal Pellegrini, Thomas Baum, Michael Dieckmeyer, Luca Malagutti, Isabel Seeger, Anna S. Walburga, Benedikt Wiestler, Nico Sollmann, Maximilian T. Löffler, Arthur Wagner, Jan S. Kirschke