Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study - Summary - MDSpire

Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study

  • By

  • Stephanie Ji Chen

  • Manoj Venkat Maddali

  • Curtis Langlotz

  • Christian Bluethgen

  • Jonathan Chen

  • Rishi Raj

  • June 26, 2026

  • 0 min

Share

Objective:

To assess the performance characteristics of current large language models (LLMs) in extracting structured binary data from clinical notes for patients with interstitial lung disease (ILD).

Approach:
  • Cohort Selection: Patients were selected from the Stanford Interstitial Lung Disease Clinic between 2018 and 2022, with 10 patients for prompt engineering and 100 for LLM evaluation.
  • Prompt Engineering: An iterative approach was used to create prompts for binary responses, evolving from simple questions to chain-of-thought and heuristic prompts.
  • Model Evaluation: LLMs were tested on the prompt engineering cohort to determine their ability to extract yes/no answers regarding key ILD clinical questions.
Key Findings:
  • Heuristic prompts demonstrated the highest accuracy in extracting binary data.
  • Chain-of-thought prompts improved model reasoning but were less effective than heuristic prompts.
  • LLMs can reliably extract structured data from verbose clinical notes when provided with well-crafted prompts.
Interpretation:

Limitations:
  • The study focused solely on binary data extraction and did not explore nuances in ILD classification.
  • Results were based on a limited cohort size and specific clinical questions.
Conclusion:

Sources:

Original Source(s)

Related Content