Preliminary evaluation of DeepSeek-R1 and GPT-5.3 in selected PET/CT clinical scenarios: patient preparation, report interpretation, and diagnostic reasoning - Report - MDSpire

Preliminary evaluation of DeepSeek-R1 and GPT-5.3 in selected PET/CT clinical scenarios: patient preparation, report interpretation, and diagnostic reasoning

  • By

  • Runze Duan

  • Jing Pang

  • Lu Zheng

  • Ziyu Guo

  • Tianyue Li

  • Yanzhu Bian

  • Yujing Hu

  • June 11, 2026

  • 0 min

Share

Clinical Report: Initial Assessment of DeepSeek-R1 and GPT-5.3 in PET/CT

Overview

This study evaluates the performance of DeepSeek-R1 and GPT-5.3 in clinical scenarios involving PET/CT.

Background

The integration of [18F]FDG PET/CT imaging is increasingly utilized in clinical practice, necessitating efficient tools to assist nuclear medicine professionals. This study assesses the clinical applicability of DeepSeek-R1 as a cost-effective AI assistant compared to GPT-5.3.

Data Highlights

ModelAppropriatenessHelpfulnessEmpathyInconsistencyValid References
DeepSeek-R194.9%100%91.7%7.7%37%
GPT-5.394.9%100%66.7%5.1%33%

Key Findings

  • DeepSeek-R1 achieved 94.9% appropriateness and 100% helpfulness across 39 tasks.
  • 91.7% of DeepSeek-R1's responses to follow-up inquiries were rated empathetic.
  • DeepSeek-R1 had a 7.7% inconsistency rate, primarily in tumor staging.
  • GPT-5.3 showed a lower inconsistency rate of 5.1% but lower empathy at 66.7%.
  • Both models had a primary diagnosis accuracy of 10% and differential diagnosis accuracy of 60% for difficult cases.
  • 37% of DeepSeek-R1's cited references were fully valid, compared to 33% for GPT-5.3.

Clinical Implications

The findings suggest that while both DeepSeek-R1 and GPT-5.3 can assist in clinical scenarios, they cannot replace clinicians due to reference validity issues and potential inconsistencies. DeepSeek-R1 may serve as a cost-effective auxiliary tool in nuclear medicine.

Conclusion

DeepSeek-R1 and GPT-5.3 exhibit complementary strengths but face challenges with reference validity and consistency.

Related Resources & Content

  1. European Radiology, 2023 -- Assessment of the Reliability and Practicality of PSMA-RADS 1.0
  2. npj Digital Medicine, 2026 -- Assessment of Large Language Models for Generating Diagnostic Impressions
  3. European Radiology, 2024 -- Creation and assessment of two open-source nnU-Net models for automated segmentation of lung tumors
  4. The EANM Journal, 2025 -- AI-assisted PET-CT response assessment
  5. European Radiology — Automated Evaluation of Lung Cancer Using 18F-PET/CT with Retina U-Net and Segmentation of Anatomical Regions
  6. The EANM Journal | Vol 1, In progress (December 2025)
  7. EANM/SNMMI Guideline on hybrid [18F]FDG PET for infection and inflammation
  8. Press release on GLP-1 receptor agonists and PET-CT interpretation
  9. coPERCIST: AI-assisted PET-CT response assessment - PubMed
  10. Current Status of Revisions to the Lugano Classification in Lymphoma - - 2025 - Hematological Oncology - Wiley Online Library
  11. ACR Announces First New Appropriateness Criteria Release of 2025
  12. [18F]FDG PET/CT versus Bone Scintigraphy for the Diagnosis of Bone Metastasis in Breast Cancer: A Systematic Review and Meta-Analysis - PubMed
  13. Comparing the diagnostic efficacy of [18F]FDG PET/CT and [18F]FDG PET/MRI in breast cancer recurrence: a systematic review and meta-analysis - PubMed
  14. Imaging Efficacy of [18F]CTT1057 PET for the Detection of PSMA-Positive Tumors Using Histopathology as Standard of Truth: Results from the GuideView Phase 2/3 Prospective Multicenter Study - PubMed
  15. Diagnostic Performance of Radiolabelled FAPI Versus [18F]FDG PET Imaging in Hepato-Pancreato-Biliary Oncology: A Systematic Review and Meta-Analysis - PubMed

Original Source(s)

Related Content