Across six experiments—including a blinded, real-world ER evaluation—an OpenAI large language model outperformed physician baselines on multiple clinical reasoning tasks, though not on key safety endpoints such as cannot-miss diagnoses
Evidence suggests improved outcomes in selected patients with cardiac arrest, but limited data, complications, and resource demands may restrict broader use.