Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors - Report - MDSpire

Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors

  • By

  • Yasuhito Mitsuyama

  • Hiroyuki Tatekawa

  • Hirotaka Takita

  • Fumi Sasaki

  • Akane Tashiro

  • Satoshi Oue

  • Shannon L. Walston

  • Yuta Nonomiya

  • Ayumi Shintani

  • Yukio Miki

  • Daiju Ueda

  • August 28, 2024

  • 0 min

Share

Diagnostic Accuracy of GPT-4 ChatGPT vs Radiologists on Brain Tumor MRI Reports

Overview

This study evaluated the diagnostic performance of GPT-4-based ChatGPT compared to neuroradiologists and general radiologists using real-world brain tumor MRI radiology reports. GPT-4 demonstrated promising diagnostic capabilities, suggesting potential utility as a clinical decision support tool in neuroradiology.

Background

Large language models (LLMs) like GPT-4 have shown potential in medical applications, including radiology. Prior studies focused on curated quiz-like cases, limiting generalizability to real-world clinical practice. Brain tumor MRI reports are critical for guiding treatment decisions, and pathological outcomes provide definitive diagnoses. This study uniquely assesses GPT-4's diagnostic accuracy using authentic clinical radiology reports, comparing its performance with that of expert neuroradiologists and general radiologists.

Data Highlights

The study retrospectively collected preoperative brain tumor MRI reports from two institutions over multiple years. Reports were translated from Japanese to English and stripped of explicit diagnoses to prevent data leakage. GPT-4 was prompted to generate three ranked differential diagnoses per case. Radiologists independently reviewed the same findings to provide their diagnoses. The study design followed STARD guidelines and was ethically approved.

Key Findings

  • GPT-4-based ChatGPT generated differential diagnoses from real-world brain tumor MRI reports with diagnostic accuracy comparable to neuroradiologists.
  • GPT-4 outperformed general radiologists in diagnostic accuracy on the same dataset.
  • The model effectively ranked three differential diagnoses in order of likelihood, aiding clinical decision-making.
  • Use of real-world, uncurated radiology reports highlighted GPT-4's robustness beyond typical quiz-style cases.
  • Translation and prompt engineering ensured accurate input processing without information loss.
  • GPT-4’s diagnostic suggestions could potentially reduce time and cognitive load in clinical neuroradiology workflows.

Clinical Implications

GPT-4-based ChatGPT shows promise as a diagnostic aid in neuroradiology, particularly for brain tumor MRI interpretation. Its ability to generate ranked differential diagnoses from routine clinical reports may support radiologists by enhancing diagnostic accuracy and efficiency. Integration of such AI tools could complement expert review and assist in complex diagnostic scenarios.

Conclusion

This study demonstrates that GPT-4-driven ChatGPT can achieve diagnostic performance comparable to neuroradiologists when analyzing real-world brain tumor MRI reports, underscoring its potential as a valuable adjunct in clinical radiology practice.

References

  1. OpenAI GPT-4 Model -- ChatGPT (May 24 version)
  2. Standards for Reporting of Diagnostic Accuracy Studies (STARD) -- 2015
  3. Osaka Metropolitan University Graduate School of Medicine Ethical Committee -- Approval no. 2023-015

Original Source(s)

Related Content