Blinded two-phase evaluation of large language models in complex cardiac surgery: task-specific performance and human-AI collaboration - Scorecard - MDSpire
Advertisement
Blinded two-phase evaluation of large language models in complex cardiac surgery: task-specific performance and human-AI collaboration
Clinical Scorecard: Two-Phase Blinded Assessment of Large Language Models in Complex Cardiac Surgery: Evaluating Task-Specific Efficacy and Collaboration with Clinicians
At a Glance
Category
Detail
Condition
Key Mechanisms
Evaluation of large language models (LLMs) in surgical decision-making and human-LLM collaboration.
Target Population
Care Setting
Key Highlights
LLM performance varied, with O1 scoring highest (0.896) and Llama3-OpenBioLLM-70B lowest (0.521).