Large language model-based uncertainty-adjusted label extraction for artificial intelligence model development in upper extremity radiography - Scorecard - MDSpire

Large language model-based uncertainty-adjusted label extraction for artificial intelligence model development in upper extremity radiography

  • By

  • Hanna Kreutzer

  • Anne-Sophie Caselitz

  • Thomas Dratsch

  • Daniel Pinto dos Santos

  • Christiane Kuhl

  • Daniel Truhn

  • Sven Nebelung

  • November 14, 2025

  • 0 min

Share

Clinical Scorecard: Utilizing Large Language Models for Uncertainty-Adjusted Label Extraction in Developing AI Models for Upper Extremity Radiography

At a Glance

CategoryDetail
ConditionUpper extremity radiographic conditions including fractures and other pathologies of the clavicle, elbow, and thumb
Key MechanismsAutomated label extraction from radiologic reports using large language models (LLMs) with uncertainty detection to train multi-label classification convolutional neural networks (CNNs)
Target PopulationAdult patients (≥18 years) undergoing upper extremity radiography
Care SettingRadiology departments in tertiary care hospitals with access to digital radiography and AI model development infrastructure

Key Highlights

  • LLMs (e.g., GPT-4o) can accurately extract structured labels from free-text radiologic reports across multiple upper extremity regions.
  • Incorporation of label uncertainty (e.g., 'likely', 'suggestive') in automated extraction addresses diagnostic ambiguity and reduces noise in datasets.
  • Multi-label classification models trained on LLM-extracted labels demonstrate effective performance on internal and external test datasets.

Guideline-Based Recommendations

Diagnosis

  • Use LLMs to extract structured, multi-label diagnostic information from radiologic reports for upper extremity imaging.
  • Incorporate uncertainty detection in label extraction to reflect diagnostic ambiguity inherent in radiology reports.

Management

  • Train convolutional neural networks using LLM-extracted labels with uncertainty-adjusted inclusive and exclusive labeling strategies.
  • Utilize multi-label classification models to detect common and less frequent conditions in clavicle, elbow, and thumb radiographs.

Monitoring & Follow-up

  • Validate AI model performance on both internal and external datasets with manually corrected ground truth labels.
  • Monitor mislabeling rates and model generalizability across different imaging centers and equipment.

Risks

  • Ignoring label uncertainty may introduce noise and degrade AI model performance.
  • Automated label extraction without expert oversight may risk mislabeling, especially in complex terminologies.

Patient & Prescribing Data

Adult patients undergoing upper extremity radiography at university hospitals

Automated label extraction using LLMs facilitates efficient and scalable AI model training for fracture and pathology detection, potentially improving diagnostic workflows.

Clinical Best Practices

  • Apply exclusion criteria including age <18 years, post-operative imaging, follow-up exams, and amputations to ensure dataset consistency.
  • Use region-specific structured templates for label extraction to capture relevant conditions per anatomic site.
  • Convert uncertain labels to inclusive or exclusive categories during model training to assess impact on performance.
  • Employ multi-center datasets and external validation to ensure model robustness and generalizability.

References

Original Source(s)

Related Content