Large Language Model Automated Extraction of Clinical Signs and Symptoms From Emergency Department Reports for Machine Learning Prediction Models: Development and Validation Study - Summary - MDSpire

Large Language Model Automated Extraction of Clinical Signs and Symptoms From Emergency Department Reports for Machine Learning Prediction Models: Development and Validation Study

  • By

  • Anoeska Schipper

  • Peter Belgers

  • Rory David O'Connor

  • Lieke van de Wouw

  • Luc Builtjes

  • Joeran S Bosma

  • Ron Kusters

  • Steef Kurstjens

  • Matthieu Rutten

  • Bram van Ginneken

  • April 30, 2026

  • 0 min

Share

Objective:

To evaluate whether a small multilingual LLM (Qwen 2.5:14B) can automatically extract clinical features from Dutch ED reports and provide reliable inputs for a prediction model for acute abdominal pain (AAP), which is critical for timely and accurate patient management.

Key Findings:
  • High interrater agreement for manually annotated features (Krippendorff α values of 0.93 for binary features), indicating strong reliability.
  • LLM-based feature extraction achieved comparable accuracy to physician annotations, suggesting potential for clinical application.
  • The study supports the feasibility of using LLMs for scalable, privacy-preserving workflows in ED decision support, which could enhance patient outcomes.
Interpretation:

The use of a small multilingual LLM for feature extraction from ED reports is promising, demonstrating potential for effective integration into clinical workflows, which could streamline processes and improve patient care.

Limitations:
  • The study focused on a specific clinical use case (AAP) and may not generalize to other conditions.
  • Results are based on a single hospital's data, which may limit external validity.
  • Potential biases in data collection or annotation could affect the reliability of the findings.
Conclusion:

Automated extraction of clinical features using LLMs can enhance data usability in emergency medicine, supporting improved decision-making processes.

Original Source(s)

Related Content