Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study

By
Stephanie Ji Chen
Manoj Venkat Maddali
Curtis Langlotz
Christian Bluethgen
Jonathan Chen
Rishi Raj
June 26, 2026
0 min

Journal Of Medical Internet Research (Jmir)

Objective:

To assess the performance characteristics of current large language models (LLMs) in extracting structured binary data from clinical notes for patients with interstitial lung disease (ILD).

Approach:

Cohort Selection: Patients were selected from the Stanford Interstitial Lung Disease Clinic between 2018 and 2022, with 10 patients for prompt engineering and 100 for LLM evaluation.
Prompt Engineering: An iterative approach was used to create prompts for binary responses, evolving from simple questions to chain-of-thought and heuristic prompts.
Model Evaluation: LLMs were tested on the prompt engineering cohort to determine their ability to extract yes/no answers regarding key ILD clinical questions.

Key Findings:

Heuristic prompts demonstrated the highest accuracy in extracting binary data.
Chain-of-thought prompts improved model reasoning but were less effective than heuristic prompts.
LLMs can reliably extract structured data from verbose clinical notes when provided with well-crafted prompts.

Interpretation:

Limitations:

The study focused solely on binary data extraction and did not explore nuances in ILD classification.
Results were based on a limited cohort size and specific clinical questions.

Conclusion:

Sources:

Stanford University Institutional Review Board

Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study

Objective:

Approach:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Sources:

Original Source(s)

Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study

Related Content

Construction and validation of a risk prediction model for early ventilator-induced diaphragm dysfunction in mechanically ventilated patients

Assessment of Potential Drug–Drug Interactions and Associated Factors Among Pulmonary Inpatients in a Tertiary Care Hospital: A Cross-Sectional Study

Neutrophil EMR3 dynamics in critically ill patients with sepsis: an ICU experience