Preliminary evaluation of DeepSeek-R1 and GPT-5.3 in selected PET/CT clinical scenarios: patient preparation, report interpretation, and diagnostic reasoning - Scorecard - MDSpire
Advertisement
Preliminary evaluation of DeepSeek-R1 and GPT-5.3 in selected PET/CT clinical scenarios: patient preparation, report interpretation, and diagnostic reasoning
Clinical Scorecard: Initial Assessment of DeepSeek-R1 and GPT-5.3 in Specific PET/CT Clinical Contexts: Patient Preparation, Report Analysis, and Diagnostic Evaluation
At a Glance
Category
Detail
Condition
[18F]FDG PET/CT utilization
Key Mechanisms
Evaluation of AI models in patient communication, report interpretation, and diagnosis
Target Population
Patients undergoing [18F]FDG PET/CT scans
Care Setting
Nuclear medicine
Key Highlights
DeepSeek-R1 achieved 94.9% appropriateness and 100% helpfulness across tasks.
GPT-5.3 showed equivalent performance with 94.9% appropriateness and 100% helpfulness.
DeepSeek-R1 had a higher empathy score (91.7%) compared to GPT-5.3 (66.7%) for follow-up inquiries.
Both models had similar rates of substantial inconsistencies in responses.
Guideline-Based Recommendations
Diagnosis
Both models demonstrated 10% primary diagnosis accuracy and 60% differential diagnosis accuracy.
Management
AI tools should not replace clinicians in critical processes such as obtaining informed consent.
Monitoring & Follow-up
Future optimization needed for consistency, diagnostic accuracy, and reference validity.
Risks
AI models may produce incorrect answers and inconsistencies.
Patient & Prescribing Data
Patients requiring PET/CT imaging for diagnosis and treatment planning.
AI can assist in delivering standardized patient information.
Clinical Best Practices
Utilize AI models as auxiliary tools to support nuclear medicine workflows.
Ensure clinical validation of AI tools before implementation.