Intra-axial primary brain tumor differentiation: comparing large language models on structured MRI reports vs. radiologists on images

By
Takeshi Nakaura
Hiroyuki Uetani
Naofumi Yoshida
Naoki Kobayashi
Yasunori Nagayama
Masafumi Kidoh
Jun-Ichiro Kuroda
Akitake Mukasa
Toshinori Hirai
August 22, 2025
0 min

European Radiology

Overview

This study evaluated the diagnostic performance of multiple large language models (LLMs) in differentiating intra-axial primary brain tumors using structured MRI reports. A cohort of 137 surgically confirmed cases was analyzed, comparing LLM-generated differential diagnoses to radiologist interpretations. The findings highlight the potential of LLMs as assistive tools in neuroradiology, particularly when interpreting standardized imaging reports.

Background

Intra-axial primary brain tumors arise within the brain parenchyma and present significant diagnostic challenges due to overlapping imaging features. MRI is the cornerstone for evaluating these tumors, providing detailed morphological and compositional information. Recent advances in artificial intelligence, especially large language models, have shown promise in medical diagnostics, but their ability to interpret medical images directly remains limited. Structured MRI reports, containing standardized imaging findings, may serve as an effective input for LLMs to assist in tumor differentiation. However, prior studies have not specifically addressed the performance of LLMs in this complex diagnostic domain.

Data Highlights

Parameter	Value
Initial patients undergoing surgery for suspected intra-axial brain tumors	499
Patients with preoperative MRI scans within 1 week before surgery	486
Patients with available structured MRI reports	148
Excluded cases confirmed as brain metastases	11
Final cohort of surgically confirmed intra-axial primary brain tumors	137

Key Findings

Structured MRI reports were created by an experienced neuroradiologist and translated accurately for LLM input.
Multiple LLMs, including OpenAI GPT series, Anthropic Claude, Meta Llama-2, Alibaba Qwen, and Google Gemini, were evaluated using standardized structured reports.
Each LLM generated a ranked list of five potential diagnoses based on structured report data.
The study cohort consisted exclusively of surgically confirmed intra-axial primary brain tumors, ensuring diagnostic accuracy in the reference standard.
Use of structured reports as input circumvents current limitations of LLMs in direct image interpretation.

Clinical Implications

The findings suggest that LLMs can effectively utilize structured MRI reports to assist in differentiating complex intra-axial primary brain tumors, potentially augmenting radiologist workflows. Incorporating LLM-based diagnostic support may improve diagnostic consistency and efficiency, especially in centers with limited neuroradiology expertise. However, reliance on standardized reporting and expert validation remains essential to ensure accuracy.

Conclusion

This study demonstrates the promising role of large language models in interpreting structured MRI reports for the differentiation of intra-axial primary brain tumors. Further research is warranted to integrate these tools into clinical practice and to evaluate their performance alongside direct image analysis.

References

Author/Source/Year -- Differentiating Intra-Axial Primary Brain Tumors: A Comparison of Large Language Models Analyzing Structured MRI Reports Versus Radiologists Interpreting Images