Large language model-based uncertainty-adjusted label extraction for artificial intelligence model development in upper extremity radiography - Scorecard - MDSpire
Advertisement
Large language model-based uncertainty-adjusted label extraction for artificial intelligence model development in upper extremity radiography
Clinical Scorecard: Utilizing Large Language Models for Uncertainty-Adjusted Label Extraction in Developing AI Models for Upper Extremity Radiography
At a Glance
Category
Detail
Condition
Upper extremity radiographic conditions including fractures and other pathologies of the clavicle, elbow, and thumb
Key Mechanisms
Automated label extraction from radiologic reports using large language models (LLMs) with uncertainty detection to train multi-label classification convolutional neural networks (CNNs)
Radiology departments in tertiary care hospitals with access to digital radiography and AI model development infrastructure
Key Highlights
LLMs (e.g., GPT-4o) can accurately extract structured labels from free-text radiologic reports across multiple upper extremity regions.
Incorporation of label uncertainty (e.g., 'likely', 'suggestive') in automated extraction addresses diagnostic ambiguity and reduces noise in datasets.
Multi-label classification models trained on LLM-extracted labels demonstrate effective performance on internal and external test datasets.
Guideline-Based Recommendations
Diagnosis
Use LLMs to extract structured, multi-label diagnostic information from radiologic reports for upper extremity imaging.
Incorporate uncertainty detection in label extraction to reflect diagnostic ambiguity inherent in radiology reports.
Management
Train convolutional neural networks using LLM-extracted labels with uncertainty-adjusted inclusive and exclusive labeling strategies.
Utilize multi-label classification models to detect common and less frequent conditions in clavicle, elbow, and thumb radiographs.
Monitoring & Follow-up
Validate AI model performance on both internal and external datasets with manually corrected ground truth labels.
Monitor mislabeling rates and model generalizability across different imaging centers and equipment.
Risks
Ignoring label uncertainty may introduce noise and degrade AI model performance.
Automated label extraction without expert oversight may risk mislabeling, especially in complex terminologies.
Patient & Prescribing Data
Adult patients undergoing upper extremity radiography at university hospitals
Automated label extraction using LLMs facilitates efficient and scalable AI model training for fracture and pathology detection, potentially improving diagnostic workflows.
Clinical Best Practices
Apply exclusion criteria including age <18 years, post-operative imaging, follow-up exams, and amputations to ensure dataset consistency.
Use region-specific structured templates for label extraction to capture relevant conditions per anatomic site.
Convert uncertain labels to inclusive or exclusive categories during model training to assess impact on performance.
Employ multi-center datasets and external validation to ensure model robustness and generalizability.
A VHA study across 11 vendors finds AI-generated primary care notes score lower than clinician-written notes, with the largest deficits in thoroughness, organization, and usefulness