Preliminary evaluation of DeepSeek-R1 and GPT-5.3 in selected PET/CT clinical scenarios: patient preparation, report interpretation, and diagnostic reasoning - Takeaways - MDSpire

Preliminary evaluation of DeepSeek-R1 and GPT-5.3 in selected PET/CT clinical scenarios: patient preparation, report interpretation, and diagnostic reasoning

  • By

  • Runze Duan

  • Jing Pang

  • Lu Zheng

  • Ziyu Guo

  • Tianyue Li

  • Yanzhu Bian

  • Yujing Hu

  • June 11, 2026

  • 0 min

Share

  • 1

    DeepSeek-R1 and GPT-5.3 were evaluated across 39 standardized tasks in patient communication, report interpretation, and diagnosis.

  • 2

    DeepSeek-R1 achieved 94.9% appropriateness and 100% helpfulness, with 91.7% of follow-up responses rated empathetic.

  • 3

    GPT-5.3 matched DeepSeek-R1 in appropriateness and helpfulness but had a lower empathy score of 66.7% for follow-up inquiries.

  • 4

    Both models demonstrated similar reference hallucination issues, with DeepSeek-R1 showing 37% valid references compared to GPT-5.3's 33%.

  • 5

    DeepSeek-R1 is identified as a cost-effective auxiliary tool, requiring future optimization for consistency and diagnostic accuracy.

Original Source(s)

Related Content