Large language model-based uncertainty-adjusted label extraction for artificial intelligence model development in upper extremity radiography - Report - MDSpire
Advertisement
Large language model-based uncertainty-adjusted label extraction for artificial intelligence model development in upper extremity radiography
LLM-Based Uncertainty-Aware Label Extraction for Upper Extremity Radiography AI Models
Overview
This study demonstrates that large language models (LLMs), specifically GPT-4o, can accurately extract multi-label structured data from radiologic reports of the clavicle, elbow, and thumb, including uncertainty detection. Incorporating uncertainty-aware labeling strategies enabled effective training of convolutional neural networks (CNNs) for multi-label classification, with model performance validated on internal and external datasets.
Background
Radiologic imaging is performed billions of times annually worldwide, yet AI development is hindered by limited annotated datasets. Manual annotation is resource-intensive and prone to inconsistency, while traditional NLP methods for label extraction struggle with complex terminology and uncertainty in reports. Large language models offer a promising alternative by interpreting nuanced language and extracting structured labels, including uncertain findings, which are common in radiology. Prior work has not addressed multi-label extraction across multiple upper extremity regions or accounted for uncertainty in labels.
Data Highlights
Dataset
Region
Number of Patients
Data Split
Internal (Aachen)
Clavicle, Elbow, Thumb
Not specified
Training 64%, Validation 16%, Test 20%
External (Cologne)
Clavicle, Elbow, Thumb
300 per region
Test only
Key Findings
GPT-4o effectively extracted structured labels from free-text radiologic reports across multiple upper extremity regions.
Labels included three states: true, false, and uncertain, capturing diagnostic ambiguity inherent in radiology reports.
Uncertain labels were handled via inclusive (counted as true) and exclusive (counted as false) strategies during CNN training.
Multi-label CNNs trained on LLM-extracted labels achieved robust classification performance on both internal and external test sets.
Accounting for label uncertainty did not adversely affect model performance, supporting the hypothesis that uncertainty-aware labeling is feasible and beneficial.
Clinical Implications
The use of LLMs for automated, uncertainty-aware label extraction can significantly reduce the labor and cost associated with manual annotation of radiologic datasets. This approach enables scalable development of AI models for multi-label classification in upper extremity radiography, potentially improving diagnostic support tools. Incorporating uncertainty in labels preserves clinically relevant ambiguity, which may enhance model robustness and generalizability.
Conclusion
LLMs such as GPT-4o can accurately and efficiently extract multi-label, uncertainty-aware annotations from radiologic reports, facilitating the training of effective AI models for upper extremity radiography. This methodology addresses key challenges in dataset curation and supports the advancement of clinically relevant AI applications.
References
World Health Organization 2023 -- Global Imaging Procedure Estimates
Al Mohamad et al 2023 -- LLM-Based Fracture Label Extraction in Ankle Radiographs
Prior Reviews 2022 -- AI Models for Upper Extremity Fracture Detection