Preliminary evaluation of DeepSeek-R1 and GPT-5.3 in selected PET/CT clinical scenarios: patient preparation, report interpretation, and diagnostic reasoning

1

DeepSeek-R1 and GPT-5.3 were evaluated across 39 standardized tasks in patient communication, report interpretation, and diagnosis.
2

DeepSeek-R1 achieved 94.9% appropriateness and 100% helpfulness, with 91.7% of follow-up responses rated empathetic.
3

GPT-5.3 matched DeepSeek-R1 in appropriateness and helpfulness but had a lower empathy score of 66.7% for follow-up inquiries.
4

Both models demonstrated similar reference hallucination issues, with DeepSeek-R1 showing 37% valid references compared to GPT-5.3's 33%.
5

DeepSeek-R1 is identified as a cost-effective auxiliary tool, requiring future optimization for consistency and diagnostic accuracy.

Frontiers In Medicine

by Runze Duan, Jing Pang, Lu Zheng, Ziyu Guo, Tianyue Li, Yanzhu Bian, Yujing Hu
June 11, 2026

1