Blinded two-phase evaluation of large language models in complex cardiac surgery: task-specific performance and human-AI collaboration - Takeaways - MDSpire

Two-Phase Blinded Assessment of Large Language Models in Complex Cardiac Surgery: Evaluating Task-Specific Efficacy and Collaboration with Clinicians

By
Marc Leon
Ruibin Feng
Manuel Quiroz Flores
Glenn Pelletier
Daniel Bethencourt
Masafumi Shibata
Hao He
Chawannuch Ruaengsri
May 29, 2026

Frontiers In Digital Health

Share

1

A two-phase evaluation framework was developed to assess large language models (LLMs) and human-LLM collaboration in complex cardiac surgery.
2

Fifteen high-fidelity cardiac surgery scenarios were created by senior surgeons, each paired with a reasoning task and expert-curated reference answers.
3

LLM performance varied, with median normalized scores highest for O1 (0.896) and lowest for Llama3-OpenBioLLM-70B (0.521) across scenarios.
4

Second-round evaluations showed a decline in scores for four LLMs, with a notable percentage of ratings revised from affirmative to negative.
5

All models exhibited clinical limitations, particularly in complex reasoning tasks, indicating they are not yet ready for safe use in surgical settings.

Original Source(s)

Frontiers In Digital Health

Blinded two-phase evaluation of large language models in complex cardiac surgery: task-specific performance and human-AI collaboration

by Marc Leon, Ruibin Feng, Manuel Quiroz Flores, Glenn Pelletier, Daniel Bethencourt, Masafumi Shibata, Hao He, Chawannuch Ruaengsri
May 29, 2026

Related Content

Conexiant

Inside the Baseball Athlete's Heart

Echo data from elite combine participants describe cardiac adaptation patterns in elite baseball players.

by Kerri Miller
June 28, 2026
5 min

Conexiant

Top 10 FDA Recalls Physicians Should Know

A structured overview of recent FDA recalls, corrections, and alerts involving medications, ventilators, insulin delivery systems, cardiovascular devices, anesthesia products, and other equipment used in clinical practice.

by Conexiant News Staff
June 18, 2026
5 min

Conexiant

Gender Bias Found in Regional Surgery Survey

Women surgeons were more often perceived as targets of workplace discrimination despite similar perceived clinical judgment and surgical skill.

by Andrea Surnit
May 22, 2026
4 min