Clinical Report: Evaluating LLMs for Brain MRI Protocol Generation
Overview
This study assessed the performance of four large language models (LLMs)—GPT-4o, o3-mini, DeepSeek-R1, and Qwen2.5-72B—in generating detailed brain MRI protocols from realistic clinical case descriptions. The models were evaluated against reference protocols established by experienced neuroradiologists, with and without enhanced contextual information, and compared to protocols generated by radiology residents.
Background
Brain MRI protocoling is a critical yet time-consuming task requiring radiologists to balance comprehensive imaging with efficiency to avoid repeat examinations and reduce costs. Errors in protocoling are a leading cause of callback MRI scans. With increasing MRI demand and radiologist workload, AI tools, including LLMs, have been explored to assist in protocol selection. Prior studies have focused on modality or single sequence suggestions, but granular sequence-level protocol generation based on realistic clinical cases remains underexplored.
Data Highlights
Model
Type
Access
Temperature
Query Date
GPT-4o
Closed-weight
OpenAI API
0
Feb 6, 2025
o3-mini
Closed-weight
OpenAI API
Not supported
Feb 16, 2025
DeepSeek-R1
Open-weight
Fireworks AI
0
Feb 6, 2025
Qwen2.5-72B
Open-weight
Fireworks AI
0
Feb 6, 2025
Key Findings
Two board-certified neuroradiologists established reference brain MRI protocols for 150 anonymized, categorized clinical cases, with consensus adjudication for disagreements.
LLMs generated brain MRI protocols under two conditions: base (without external info) and enhanced (with local standard protocols and sequence explanations).
GPT-4o and o3-mini are closed-weight models accessed via OpenAI API; DeepSeek-R1 and Qwen2.5-72B are open-weight models accessed via Fireworks AI.
Structured JSON output mode and deterministic temperature settings were used to ensure consistent and analyzable protocol generation.
Radiology residents also generated protocols for comparison, highlighting the potential of LLMs to support or augment human protocoling.
Clinical Implications
LLMs show promise in automating the generation of detailed brain MRI protocols, potentially reducing radiologist workload and minimizing protocol errors that lead to repeat scans. Incorporating local protocol standards and sequence explanations enhances model performance, suggesting that tailored AI integration could improve clinical workflow efficiency. However, human oversight remains essential to ensure clinical appropriateness and safety.
Conclusion
This study demonstrates that state-of-the-art LLMs can generate clinically relevant brain MRI protocols from realistic case descriptions, with enhanced contextual input improving accuracy. These findings support further development and integration of LLM-based tools to assist radiologists in protocoling tasks.
References
Wong et al. 2023 -- AI in Brain MRI Protocol Classification
Suzuki et al. 2024 -- GPT-4 for Brain MRI Sequence Suggestion
by Su Hwan Kim, Severin Schramm, Lena Schmitzer, Kerem Serguen, Sebastian Ziegelmayer, Felix Busch, Alexander Komenda, Marcus R. Makowski, Lisa C. Adams, Keno K. Bressem, Claus Zimmer, Jan Kirschke, Benedikt Wiestler, Dennis Hedderich, Tom Finck, Jannis Bodden
One of the most promising developments in advanced-stage Hodgkin lymphoma is the nivolumab-AVD regimen, which combines nivolumab with doxorubicin, vinblastine, and dacarbazine. This article provides practical knowledge for advanced practitioners to effectively manage the most common toxicities associated with nivolumab-AVD.