Large Language Model Automated Extraction of Clinical Signs and Symptoms From Emergency Department Reports for Machine Learning Prediction Models: Development and Validation Study

By
Anoeska Schipper
Peter Belgers
Rory David O'Connor
Lieke van de Wouw
Luc Builtjes
Joeran S Bosma
Ron Kusters
Steef Kurstjens
Matthieu Rutten
Bram van Ginneken
April 30, 2026
0 min

Jmir Medical Informatics

Objective:

To evaluate whether a small multilingual LLM (Qwen 2.5:14B) can automatically extract clinical features from Dutch ED reports and provide reliable inputs for a prediction model for acute abdominal pain (AAP), which is critical for timely and accurate patient management.

Key Findings:

High interrater agreement for manually annotated features (Krippendorff α values of 0.93 for binary features), indicating strong reliability.
LLM-based feature extraction achieved comparable accuracy to physician annotations, suggesting potential for clinical application.
The study supports the feasibility of using LLMs for scalable, privacy-preserving workflows in ED decision support, which could enhance patient outcomes.

Interpretation:

The use of a small multilingual LLM for feature extraction from ED reports is promising, demonstrating potential for effective integration into clinical workflows, which could streamline processes and improve patient care.

Limitations:

The study focused on a specific clinical use case (AAP) and may not generalize to other conditions.
Results are based on a single hospital's data, which may limit external validity.
Potential biases in data collection or annotation could affect the reliability of the findings.

Conclusion:

Automated extraction of clinical features using LLMs can enhance data usability in emergency medicine, supporting improved decision-making processes.

Large Language Model Automated Extraction of Clinical Signs and Symptoms From Emergency Department Reports for Machine Learning Prediction Models: Development and Validation Study

Objective:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Large Language Model Automated Extraction of Clinical Signs and Symptoms From Emergency Department Reports for Machine Learning Prediction Models: Development and Validation Study

Related Content

Case Study: Spontaneous Rupture of an Internal Thoracic Artery Aneurysm - A Rare and Critical Emergency with Treatment Challenges

Medical schools must continue to teach students about structural barriers to care

When Minutes Matter, What Is AI’s Role?