Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors - Report - MDSpire
Advertisement
Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors
Diagnostic Accuracy of GPT-4 ChatGPT vs Radiologists on Brain Tumor MRI Reports
Overview
This study evaluated the diagnostic performance of GPT-4-based ChatGPT compared to neuroradiologists and general radiologists using real-world brain tumor MRI radiology reports. GPT-4 demonstrated promising diagnostic capabilities, suggesting potential utility as a clinical decision support tool in neuroradiology.
Background
Large language models (LLMs) like GPT-4 have shown potential in medical applications, including radiology. Prior studies focused on curated quiz-like cases, limiting generalizability to real-world clinical practice. Brain tumor MRI reports are critical for guiding treatment decisions, and pathological outcomes provide definitive diagnoses. This study uniquely assesses GPT-4's diagnostic accuracy using authentic clinical radiology reports, comparing its performance with that of expert neuroradiologists and general radiologists.
Data Highlights
The study retrospectively collected preoperative brain tumor MRI reports from two institutions over multiple years. Reports were translated from Japanese to English and stripped of explicit diagnoses to prevent data leakage. GPT-4 was prompted to generate three ranked differential diagnoses per case. Radiologists independently reviewed the same findings to provide their diagnoses. The study design followed STARD guidelines and was ethically approved.
Key Findings
GPT-4-based ChatGPT generated differential diagnoses from real-world brain tumor MRI reports with diagnostic accuracy comparable to neuroradiologists.
GPT-4 outperformed general radiologists in diagnostic accuracy on the same dataset.
The model effectively ranked three differential diagnoses in order of likelihood, aiding clinical decision-making.
Use of real-world, uncurated radiology reports highlighted GPT-4's robustness beyond typical quiz-style cases.
Translation and prompt engineering ensured accurate input processing without information loss.
GPT-4’s diagnostic suggestions could potentially reduce time and cognitive load in clinical neuroradiology workflows.
Clinical Implications
GPT-4-based ChatGPT shows promise as a diagnostic aid in neuroradiology, particularly for brain tumor MRI interpretation. Its ability to generate ranked differential diagnoses from routine clinical reports may support radiologists by enhancing diagnostic accuracy and efficiency. Integration of such AI tools could complement expert review and assist in complex diagnostic scenarios.
Conclusion
This study demonstrates that GPT-4-driven ChatGPT can achieve diagnostic performance comparable to neuroradiologists when analyzing real-world brain tumor MRI reports, underscoring its potential as a valuable adjunct in clinical radiology practice.
This twice-monthly newsletter highlights recently published research where Dana-Farber faculty are listed as first or senior authors. The information is pulled from PubMed and this issue notes papers published from February 16 - 28.