Screening for Missed Opportunities for Diagnosis in the ED Using eTriggers and Large Language Models - Summary - MDSpire

Screening for Missed Opportunities for Diagnosis in the ED Using eTriggers and Large Language Models

  • By

  • Clifford M. Marks

  • Sean Gibney

  • Bryan Stenson

  • Deesha Sarma

  • Cynthia Gaudet

  • Haadi Mombini

  • Thomas A. Buckley

  • Mario Keko

  • Larry A. Nathanson

  • Laura G. Burke

  • Nathan I. Shapiro

  • Jonathan L. Burstein

  • Shamai A. Grossman

  • Anika Parab

  • Alexander T. Janke

  • Arjun K. Manrai

  • Richard A. Taylor

  • Carlo L. Rosen

  • Adam Rodman

  • Adrian D. Haimovich

  • June 29, 2026

  • 0 min

Share

Objective:

To compare the performance of commercially available large language models (LLMs) in identifying missed diagnostic opportunities (MODs) in emergency department (ED) settings using established eTrigger cohorts.

Approach:
  • Study Design: A retrospective diagnostic study was conducted using encounters from 9 hospitals within the Beth Israel Lahey Health enterprise, focusing on two established eTrigger cohorts.
  • Data Collection: Data was extracted from electronic health records (EHR) and included demographic information such as age, sex, race, and ethnicity.
  • Model Evaluation: The study evaluated LLMs based on sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), area under the receiver operating characteristic curve (AUC), and concordance with physician reviewers.
  • Review Process: Two emergency physicians independently reviewed cases to determine if there was a missed opportunity for diagnosis, using structured instructions and a rubric for adjudication.
Key Findings:
  • eTriggers are effective in identifying cases at increased risk for MODs but have low yields even in enriched cohorts.
  • LLMs can assist in diagnostic safety review, but their performance varies across different models and clinical cohorts.
  • The study followed TRIPOD-LLM guidelines for transparent reporting.
Interpretation:

The study aimed to understand how LLMs can enhance the identification of MODs in EDs, comparing their performance against traditional review methods.

Limitations:
  • The study was exempt from review and informed consent, which may limit generalizability.
  • The sample size was pragmatic and may not represent population-level trigger prevalence.
Conclusion:

The study provides insights into the comparative evaluation of LLMs for identifying MODs in emergency departments, highlighting the potential of AI in improving diagnostic safety.

Original Source(s)

Related Content