Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study - Summary - MDSpire
Advertisement
Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study
To assess the performance characteristics of current large language models (LLMs) in extracting structured binary data from clinical notes for patients with interstitial lung disease (ILD).
Approach:
Cohort Selection: Patients were selected from the Stanford Interstitial Lung Disease Clinic between 2018 and 2022, with 10 patients for prompt engineering and 100 for LLM evaluation.
Prompt Engineering: An iterative approach was used to create prompts for binary responses, evolving from simple questions to chain-of-thought and heuristic prompts.
Model Evaluation: LLMs were tested on the prompt engineering cohort to determine their ability to extract yes/no answers regarding key ILD clinical questions.
Key Findings:
Heuristic prompts demonstrated the highest accuracy in extracting binary data.
Chain-of-thought prompts improved model reasoning but were less effective than heuristic prompts.
LLMs can reliably extract structured data from verbose clinical notes when provided with well-crafted prompts.
Interpretation:
Limitations:
The study focused solely on binary data extraction and did not explore nuances in ILD classification.
Results were based on a limited cohort size and specific clinical questions.