Screening for Missed Opportunities for Diagnosis in the ED Using eTriggers and Large Language Models

By
Clifford M. Marks
Sean Gibney
Bryan Stenson
Deesha Sarma
Cynthia Gaudet
Haadi Mombini
Thomas A. Buckley
Mario Keko
Larry A. Nathanson
Laura G. Burke
Nathan I. Shapiro
Jonathan L. Burstein
Shamai A. Grossman
Anika Parab
Alexander T. Janke
Arjun K. Manrai
Richard A. Taylor
Carlo L. Rosen
Adam Rodman
Adrian D. Haimovich
June 29, 2026
0 min

Jama Network Open

Objective:

To compare the performance of commercially available large language models (LLMs) in identifying missed diagnostic opportunities (MODs) in emergency department (ED) settings using established eTrigger cohorts.

Approach:

Study Design: A retrospective diagnostic study was conducted using encounters from 9 hospitals within the Beth Israel Lahey Health enterprise, focusing on two established eTrigger cohorts.
Data Collection: Data was extracted from electronic health records (EHR) and included demographic information such as age, sex, race, and ethnicity.
Model Evaluation: The study evaluated LLMs based on sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), area under the receiver operating characteristic curve (AUC), and concordance with physician reviewers.
Review Process: Two emergency physicians independently reviewed cases to determine if there was a missed opportunity for diagnosis, using structured instructions and a rubric for adjudication.

Key Findings:

eTriggers are effective in identifying cases at increased risk for MODs but have low yields even in enriched cohorts.
LLMs can assist in diagnostic safety review, but their performance varies across different models and clinical cohorts.
The study followed TRIPOD-LLM guidelines for transparent reporting.

Interpretation:

The study aimed to understand how LLMs can enhance the identification of MODs in EDs, comparing their performance against traditional review methods.

Limitations:

The study was exempt from review and informed consent, which may limit generalizability.
The sample size was pragmatic and may not represent population-level trigger prevalence.

Conclusion:

The study provides insights into the comparative evaluation of LLMs for identifying MODs in emergency departments, highlighting the potential of AI in improving diagnostic safety.

Screening for Missed Opportunities for Diagnosis in the ED Using eTriggers and Large Language Models

Objective:

Approach:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Screening for Missed Opportunities for Diagnosis in the ED Using eTriggers and Large Language Models

Related Content

Parkinson Mortality Rate Was 72 per 100,000 in 2024

Diet-Dementia Link Varies by Biomarkers

Effects of a clinical metagenomics intervention on clinical outcomes, healthcare costs, and health-related quality of life in patients with sepsis or septic shock: results of the randomized-controlled DigiSep trial