Evaluating large language model-generated brain MRI protocols: performance of GPT4o, o3-mini, DeepSeek-R1 and Qwen2.5-72B

By
Su Hwan Kim
Severin Schramm
Lena Schmitzer
Kerem Serguen
Sebastian Ziegelmayer
Felix Busch
Alexander Komenda
Marcus R. Makowski
Lisa C. Adams
Keno K. Bressem
Claus Zimmer
Jan Kirschke
Benedikt Wiestler
Dennis Hedderich
Tom Finck
Jannis Bodden
September 3, 2025
0 min

European Radiology

Overview

This study assessed the performance of four large language models (LLMs)—GPT-4o, o3-mini, DeepSeek-R1, and Qwen2.5-72B—in generating detailed brain MRI protocols from realistic clinical case descriptions. The models were evaluated against reference protocols established by experienced neuroradiologists, with and without enhanced contextual information, and compared to protocols generated by radiology residents.

Background

Brain MRI protocoling is a critical yet time-consuming task requiring radiologists to balance comprehensive imaging with efficiency to avoid repeat examinations and reduce costs. Errors in protocoling are a leading cause of callback MRI scans. With increasing MRI demand and radiologist workload, AI tools, including LLMs, have been explored to assist in protocol selection. Prior studies have focused on modality or single sequence suggestions, but granular sequence-level protocol generation based on realistic clinical cases remains underexplored.

Data Highlights

Model	Type	Access	Temperature	Query Date
GPT-4o	Closed-weight	OpenAI API	0	Feb 6, 2025
o3-mini	Closed-weight	OpenAI API	Not supported	Feb 16, 2025
DeepSeek-R1	Open-weight	Fireworks AI	0	Feb 6, 2025
Qwen2.5-72B	Open-weight	Fireworks AI	0	Feb 6, 2025

Key Findings

Two board-certified neuroradiologists established reference brain MRI protocols for 150 anonymized, categorized clinical cases, with consensus adjudication for disagreements.
LLMs generated brain MRI protocols under two conditions: base (without external info) and enhanced (with local standard protocols and sequence explanations).
GPT-4o and o3-mini are closed-weight models accessed via OpenAI API; DeepSeek-R1 and Qwen2.5-72B are open-weight models accessed via Fireworks AI.
Structured JSON output mode and deterministic temperature settings were used to ensure consistent and analyzable protocol generation.
Radiology residents also generated protocols for comparison, highlighting the potential of LLMs to support or augment human protocoling.

Clinical Implications

LLMs show promise in automating the generation of detailed brain MRI protocols, potentially reducing radiologist workload and minimizing protocol errors that lead to repeat scans. Incorporating local protocol standards and sequence explanations enhances model performance, suggesting that tailored AI integration could improve clinical workflow efficiency. However, human oversight remains essential to ensure clinical appropriateness and safety.

Conclusion

This study demonstrates that state-of-the-art LLMs can generate clinically relevant brain MRI protocols from realistic case descriptions, with enhanced contextual input improving accuracy. These findings support further development and integration of LLM-based tools to assist radiologists in protocoling tasks.

References

Wong et al. 2023 -- AI in Brain MRI Protocol Classification
Suzuki et al. 2024 -- GPT-4 for Brain MRI Sequence Suggestion
OpenAI API Documentation 2024 -- GPT-4o and o3-mini Models
Fireworks AI Platform 2025 -- DeepSeek-R1 and Qwen2.5-72B Access

Evaluating large language model-generated brain MRI protocols: performance of GPT4o, o3-mini, DeepSeek-R1 and Qwen2.5-72B

Clinical Report: Evaluating LLMs for Brain MRI Protocol Generation

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

References

Original Source(s)

Evaluating large language model-generated brain MRI protocols: performance of GPT4o, o3-mini, DeepSeek-R1 and Qwen2.5-72B

Related Content

Polar-coordinated contour processing algorithm in optimizing SCART treatment volume

Early prediction of immune checkpoint inhibitor-related pneumonitis in advanced non-small cell lung cancer based on primary tumor Delta-radiomics features

Advances in Hodgkin Lymphoma Treatment: Clinical Considerations for Managing Toxicities in Nivolumab-AVD Therapy