Large language model-based uncertainty-adjusted label extraction for artificial intelligence model development in upper extremity radiography

By
Hanna Kreutzer
Anne-Sophie Caselitz
Thomas Dratsch
Daniel Pinto dos Santos
Christiane Kuhl
Daniel Truhn
Sven Nebelung
November 14, 2025
0 min

European Radiology

At a Glance

Category	Detail
Condition	Upper extremity radiographic conditions including fractures and other pathologies of the clavicle, elbow, and thumb
Key Mechanisms	Automated label extraction from radiologic reports using large language models (LLMs) with uncertainty detection to train multi-label classification convolutional neural networks (CNNs)
Target Population	Adult patients (≥18 years) undergoing upper extremity radiography
Care Setting	Radiology departments in tertiary care hospitals with access to digital radiography and AI model development infrastructure

Key Highlights

LLMs (e.g., GPT-4o) can accurately extract structured labels from free-text radiologic reports across multiple upper extremity regions.
Incorporation of label uncertainty (e.g., 'likely', 'suggestive') in automated extraction addresses diagnostic ambiguity and reduces noise in datasets.
Multi-label classification models trained on LLM-extracted labels demonstrate effective performance on internal and external test datasets.

Guideline-Based Recommendations

Diagnosis

Use LLMs to extract structured, multi-label diagnostic information from radiologic reports for upper extremity imaging.
Incorporate uncertainty detection in label extraction to reflect diagnostic ambiguity inherent in radiology reports.

Management

Train convolutional neural networks using LLM-extracted labels with uncertainty-adjusted inclusive and exclusive labeling strategies.
Utilize multi-label classification models to detect common and less frequent conditions in clavicle, elbow, and thumb radiographs.

Monitoring & Follow-up

Validate AI model performance on both internal and external datasets with manually corrected ground truth labels.
Monitor mislabeling rates and model generalizability across different imaging centers and equipment.

Risks

Ignoring label uncertainty may introduce noise and degrade AI model performance.
Automated label extraction without expert oversight may risk mislabeling, especially in complex terminologies.

Patient & Prescribing Data

Adult patients undergoing upper extremity radiography at university hospitals

Automated label extraction using LLMs facilitates efficient and scalable AI model training for fracture and pathology detection, potentially improving diagnostic workflows.

Clinical Best Practices

Apply exclusion criteria including age <18 years, post-operative imaging, follow-up exams, and amputations to ensure dataset consistency.
Use region-specific structured templates for label extraction to capture relevant conditions per anatomic site.
Convert uncertain labels to inclusive or exclusive categories during model training to assess impact on performance.
Employ multi-center datasets and external validation to ensure model robustness and generalizability.

Large language model-based uncertainty-adjusted label extraction for artificial intelligence model development in upper extremity radiography

Clinical Scorecard: Utilizing Large Language Models for Uncertainty-Adjusted Label Extraction in Developing AI Models for Upper Extremity Radiography

At a Glance

Key Highlights

Guideline-Based Recommendations

Diagnosis

Management

Monitoring & Follow-up

Risks

Patient & Prescribing Data

Clinical Best Practices

References

Original Source(s)

Large language model-based uncertainty-adjusted label extraction for artificial intelligence model development in upper extremity radiography

Related Content

Retinal Age Model Tied to Disease Risk

Grounded report generation for enhancing ophthalmic ultrasound interpretation using Vision-Language Segmentation models

AI Scribes Lag Clinicians on Note Quality