Evaluating large language model-generated brain MRI protocols: performance of GPT4o, o3-mini, DeepSeek-R1 and Qwen2.5-72B - Scorecard - MDSpire

Evaluating large language model-generated brain MRI protocols: performance of GPT4o, o3-mini, DeepSeek-R1 and Qwen2.5-72B

  • By

  • Su Hwan Kim

  • Severin Schramm

  • Lena Schmitzer

  • Kerem Serguen

  • Sebastian Ziegelmayer

  • Felix Busch

  • Alexander Komenda

  • Marcus R. Makowski

  • Lisa C. Adams

  • Keno K. Bressem

  • Claus Zimmer

  • Jan Kirschke

  • Benedikt Wiestler

  • Dennis Hedderich

  • Tom Finck

  • Jannis Bodden

  • September 3, 2025

  • 0 min

Share

Clinical Scorecard: Assessing the Efficacy of Large Language Models in Generating Brain MRI Protocols: A Comparison of GPT4o, o3-mini, DeepSeek-R1, and Qwen2.5-72B

At a Glance

CategoryDetail
ConditionNeurological conditions requiring brain MRI
Key MechanismsUse of large language models (LLMs) to generate granular, sequence-level brain MRI protocols based on clinical case descriptions
Target PopulationPatients undergoing brain MRI for various neurological indications
Care SettingRadiology departments performing brain MRI examinations

Key Highlights

  • Brain MRI protocoling is a complex, time-consuming task critical for diagnostic accuracy and efficiency.
  • Protocol errors are a leading cause of callback examinations, emphasizing the need for accurate protocol selection.
  • LLMs including GPT-4o, o3-mini, DeepSeek-R1, and Qwen2.5-72B were evaluated for their ability to generate brain MRI protocols using realistic clinical cases.

Guideline-Based Recommendations

Diagnosis

  • Use comprehensive clinical history and imaging request forms to guide MRI protocol selection.
  • Classify cases into categories (vascular, neoplasia, inflammation, degenerative, miscellaneous) to tailor protocols.

Management

  • Employ standardized imaging protocols for common clinical scenarios, with individualized adjustments for complex cases.
  • Consider AI tools such as LLMs to assist in protocol generation to reduce radiologist workload.

Monitoring & Follow-up

  • Evaluate inter-rater agreement on protocol sequences to ensure consistency and accuracy.
  • Monitor for protocol errors that may lead to repeat examinations.

Risks

  • Omission of critical MRI sequences can necessitate repeat scans, increasing patient burden and healthcare costs.
  • Excessive or unnecessary sequences may prolong scan time and increase exposure to contrast agents with potential adverse effects.

Patient & Prescribing Data

150 fictitious brain MRI cases based on anonymized real patient data representing typical and atypical clinical scenarios

LLMs can generate brain MRI protocols with varying accuracy; enhanced prompts with local standard protocols improve performance.

Clinical Best Practices

  • Ensure anonymization and ethical considerations when using patient data for AI model training and evaluation.
  • Use consensus by experienced neuroradiologists to establish reference protocols for validation of AI-generated protocols.
  • Incorporate in-context learning with local standard protocols and sequence explanations to enhance LLM output quality.
  • Apply structured output modes (e.g., JSON schema) for programmatic analysis and integration of AI-generated protocols.

References

Original Source(s)

Related Content