Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors

By
Yasuhito Mitsuyama
Hiroyuki Tatekawa
Hirotaka Takita
Fumi Sasaki
Akane Tashiro
Satoshi Oue
Shannon L. Walston
Yuta Nonomiya
Ayumi Shintani
Yukio Miki
Daiju Ueda
August 28, 2024
0 min

European Radiology

Overview

This study evaluated the diagnostic performance of GPT-4-based ChatGPT compared to neuroradiologists and general radiologists using real-world brain tumor MRI radiology reports. GPT-4 demonstrated promising diagnostic capabilities, suggesting potential utility as a clinical decision support tool in neuroradiology.

Background

Large language models (LLMs) like GPT-4 have shown potential in medical applications, including radiology. Prior studies focused on curated quiz-like cases, limiting generalizability to real-world clinical practice. Brain tumor MRI reports are critical for guiding treatment decisions, and pathological outcomes provide definitive diagnoses. This study uniquely assesses GPT-4's diagnostic accuracy using authentic clinical radiology reports, comparing its performance with that of expert neuroradiologists and general radiologists.

Data Highlights

The study retrospectively collected preoperative brain tumor MRI reports from two institutions over multiple years. Reports were translated from Japanese to English and stripped of explicit diagnoses to prevent data leakage. GPT-4 was prompted to generate three ranked differential diagnoses per case. Radiologists independently reviewed the same findings to provide their diagnoses. The study design followed STARD guidelines and was ethically approved.

Key Findings

GPT-4-based ChatGPT generated differential diagnoses from real-world brain tumor MRI reports with diagnostic accuracy comparable to neuroradiologists.
GPT-4 outperformed general radiologists in diagnostic accuracy on the same dataset.
The model effectively ranked three differential diagnoses in order of likelihood, aiding clinical decision-making.
Use of real-world, uncurated radiology reports highlighted GPT-4's robustness beyond typical quiz-style cases.
Translation and prompt engineering ensured accurate input processing without information loss.
GPT-4’s diagnostic suggestions could potentially reduce time and cognitive load in clinical neuroradiology workflows.

Clinical Implications

GPT-4-based ChatGPT shows promise as a diagnostic aid in neuroradiology, particularly for brain tumor MRI interpretation. Its ability to generate ranked differential diagnoses from routine clinical reports may support radiologists by enhancing diagnostic accuracy and efficiency. Integration of such AI tools could complement expert review and assist in complex diagnostic scenarios.

Conclusion

This study demonstrates that GPT-4-driven ChatGPT can achieve diagnostic performance comparable to neuroradiologists when analyzing real-world brain tumor MRI reports, underscoring its potential as a valuable adjunct in clinical radiology practice.

References

OpenAI GPT-4 Model -- ChatGPT (May 24 version)
Standards for Reporting of Diagnostic Accuracy Studies (STARD) -- 2015
Osaka Metropolitan University Graduate School of Medicine Ethical Committee -- Approval no. 2023-015

Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors

Diagnostic Accuracy of GPT-4 ChatGPT vs Radiologists on Brain Tumor MRI Reports

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

References

Original Source(s)

Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors

Related Content

Dana-Farber Research News 03.15.2026

DARE-FUSE: domain aligned evidence guided learning for joint brain tumor MRI segmentation and classification

Decoding Idiopathic Intracranial Hypertension