Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns - Report - MDSpire
Advertisement
Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns
Clinical Report: Evaluation of Deepseek-R1 and ChatGPT-5.4 Performance
Overview
This study evaluates the accuracy and reproducibility of Deepseek-R1 and ChatGPT-5.4 in the Medical Laboratory Junior Professional Title Examination, comparing their performance with that of interns.
Background
The integration of artificial intelligence in medical education is transforming knowledge delivery and assessment methods. Evaluating the performance of AI models on standardized examinations is crucial for understanding their role in medical training. This study focuses on the performance of two AI models in a significant medical examination context.
Data Highlights
Model
Accuracy Comparison
Reproducibility
Deepseek-R1
Higher accuracy than ChatGPT-5.4 in Papers I, II, III
Fleiss' kappa > 0.7
ChatGPT-5.4
Significant cross-disciplinary differences in Papers I, II, III
Fleiss' kappa > 0.7
Interns
Performed comparably to AI in Paper I, lower in Papers II, III, IV
N/A
Key Findings
Both models showed good reproducibility with Fleiss' kappa coefficients exceeding 0.7.
No significant differences in accuracy were found across question types for either model.
Deepseek-R1 outperformed ChatGPT-5.4 in Papers I, II, and III.
Interns performed comparably to AI models only in Paper I.
Deepseek-R1 exhibited the highest overall performance across the examination.
Clinical Implications
The findings suggest that AI models like Deepseek-R1 and ChatGPT-5.4 can serve as effective tools for examination preparation in medical education. Their performance indicates potential utility in enhancing learning outcomes for medical students.
Conclusion
Deepseek-R1 and ChatGPT-5.4 demonstrated strong performance in the Medical Laboratory Junior Professional Title Examination, with Deepseek-R1 showing superior accuracy.