One Step Closer to Real-Time Detection of Missed Opportunities for Diagnosis in the ED Using LLMs
By
Fernanda Bellolio
Daniel Cabrera
June 29, 2026
Objective: To evaluate the effectiveness of large language models (LLMs) in identifying missed opportunities for diagnosis in the emergency department.
Approach: Study Design: The study evaluated 6 commercially available LLMs to identify missed diagnostic opportunities in 288 emergency department encounters.Key Findings: The overall prevalence of missed opportunities for diagnosis was 13.5%. Area under the receiver operating characteristic curves (AUCs) ranged from 0.65 to 0.73 for 72-hour return and 0.57 to 0.82 for floor-to-ICU cohorts. Models exhibited different sensitivity-specificity tradeoffs, with Claude Sonnet 4 favoring sensitivity and GPT-5mini favoring specificity. Physician interrater agreement was 81.9%, indicating variability in expert reviews. Interpretation: The study indicates that LLMs can detect missed diagnostic opportunities in low-prevalence outcomes using unstructured clinical notes.
Limitations: The study is retrospective in nature. Aggregate discrimination metrics alone are insufficient to determine model appropriateness for clinical tasks. Conclusion: The findings indicate progress towards the deployment of LLM-based screening tools for real-time diagnostic safety in emergency medicine.
Sources: