Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns - Summary - MDSpire
Advertisement
Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns
To systematically evaluate the accuracy, reproducibility, and performance of Deepseek-R1 and ChatGPT-5.4 in the Medical Laboratory Junior Professional Title Examination and compare their performance with that of interns.
Approach:
Key Findings:
Neither model showed significant differences in accuracy across the three repeated sessions (p > 0.05).
No significant differences in accuracy were observed across question types for either model (p > 0.05).
Across disciplines, Deepseek-R1 showed no significant differences across disciplines (p > 0.05), whereas ChatGPT-5.4 exhibited significant cross-disciplinary differences in Papers I, II, and III (p < 0.05).
Interpretation:
Deepseek-R1 demonstrated greater overall accuracy and consistency compared to ChatGPT-5.4.