Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors
By
Yasuhito Mitsuyama
Hiroyuki Tatekawa
Hirotaka Takita
Fumi Sasaki
Akane Tashiro
Satoshi Oue
Shannon L. Walston
Yuta Nonomiya
Ayumi Shintani
Yukio Miki
Daiju Ueda
August 28, 2024
Clinical Scorecard: Evaluation of Diagnostic Accuracy of GPT-4-Driven ChatGPT Compared to Radiologists Analyzing Real-World Brain Tumor Radiology Reports
At a Glance
Category Detail
Condition Brain tumors diagnosed via MRI radiology reports
Key Mechanisms GPT-4-based ChatGPT processes real-world MRI radiology report text to generate differential and final diagnoses
Target Population Patients undergoing brain MRI for preoperative brain tumor evaluation
Care Setting Clinical radiology departments in tertiary hospitals
Key Highlights
First study evaluating GPT-4 diagnostic accuracy using uncurated, real-world brain tumor MRI radiology reports. Comparison of GPT-4-based ChatGPT diagnostic performance with neuroradiologists and general radiologists. Use of standardized prompts and blinded reading tests to ensure fair comparison and avoid data leakage.
Guideline-Based Recommendations
Diagnosis
Use GPT-4-based ChatGPT to generate three ranked differential diagnoses from MRI report findings. Verify GPT-4 outputs against neuroradiologist and general radiologist interpretations for clinical decision-making.
Management
Consider GPT-4 as a complementary tool to support radiologists in diagnostic workflows, especially for complex brain tumor cases. Maintain multidisciplinary review including pathological confirmation for definitive diagnosis.
Monitoring & Follow-up
Regularly assess GPT-4 diagnostic outputs against clinical outcomes and pathology to monitor accuracy and reliability. Update and validate GPT-4 prompts and input data to reflect evolving clinical practice.
Risks
Potential bias due to unstructured and diverse real-world report data may affect GPT-4 diagnostic accuracy. Overreliance on AI outputs without expert radiologist review may lead to diagnostic errors.
Patient & Prescribing Data
Patients with brain tumors undergoing MRI and subsequent radiological evaluation
Accurate radiological diagnosis via GPT-4 can influence treatment decisions such as surgery, medication, or monitoring strategies.
Clinical Best Practices
Use GPT-4-based ChatGPT as an adjunct diagnostic aid rather than a standalone tool. Ensure radiology reports are carefully translated and verified to preserve diagnostic information when using AI tools. Incorporate multidisciplinary expertise including neuroradiologists and general radiologists in diagnostic workflows. Apply standardized prompting techniques to optimize AI diagnostic output quality. Conduct ongoing validation studies comparing AI outputs with clinical gold standards such as pathology.
References