Clinical Scorecard: Assessing the Efficacy of Large Language Models in Generating Brain MRI Protocols: A Comparison of GPT4o, o3-mini, DeepSeek-R1, and Qwen2.5-72B
At a Glance
Category
Detail
Condition
Neurological conditions requiring brain MRI
Key Mechanisms
Use of large language models (LLMs) to generate granular, sequence-level brain MRI protocols based on clinical case descriptions
Target Population
Patients undergoing brain MRI for various neurological indications
Brain MRI protocoling is a complex, time-consuming task critical for diagnostic accuracy and efficiency.
Protocol errors are a leading cause of callback examinations, emphasizing the need for accurate protocol selection.
LLMs including GPT-4o, o3-mini, DeepSeek-R1, and Qwen2.5-72B were evaluated for their ability to generate brain MRI protocols using realistic clinical cases.
Guideline-Based Recommendations
Diagnosis
Use comprehensive clinical history and imaging request forms to guide MRI protocol selection.
Classify cases into categories (vascular, neoplasia, inflammation, degenerative, miscellaneous) to tailor protocols.
Management
Employ standardized imaging protocols for common clinical scenarios, with individualized adjustments for complex cases.
Consider AI tools such as LLMs to assist in protocol generation to reduce radiologist workload.
Monitoring & Follow-up
Evaluate inter-rater agreement on protocol sequences to ensure consistency and accuracy.
Monitor for protocol errors that may lead to repeat examinations.
Risks
Omission of critical MRI sequences can necessitate repeat scans, increasing patient burden and healthcare costs.
Excessive or unnecessary sequences may prolong scan time and increase exposure to contrast agents with potential adverse effects.
Patient & Prescribing Data
150 fictitious brain MRI cases based on anonymized real patient data representing typical and atypical clinical scenarios
LLMs can generate brain MRI protocols with varying accuracy; enhanced prompts with local standard protocols improve performance.
Clinical Best Practices
Ensure anonymization and ethical considerations when using patient data for AI model training and evaluation.
Use consensus by experienced neuroradiologists to establish reference protocols for validation of AI-generated protocols.
Incorporate in-context learning with local standard protocols and sequence explanations to enhance LLM output quality.
Apply structured output modes (e.g., JSON schema) for programmatic analysis and integration of AI-generated protocols.
by Su Hwan Kim, Severin Schramm, Lena Schmitzer, Kerem Serguen, Sebastian Ziegelmayer, Felix Busch, Alexander Komenda, Marcus R. Makowski, Lisa C. Adams, Keno K. Bressem, Claus Zimmer, Jan Kirschke, Benedikt Wiestler, Dennis Hedderich, Tom Finck, Jannis Bodden