Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns

By
Zhili Niu
Dongling Tang
Juanjuan Chen
Pingan Zhang
Chengliang Zhu
June 19, 2026
0 min

Frontiers In Digital Health

Objective:

To systematically evaluate the accuracy, reproducibility, and performance of Deepseek-R1 and ChatGPT-5.4 in the Medical Laboratory Junior Professional Title Examination and compare their performance with that of interns.

Approach:

Key Findings:

Neither model showed significant differences in accuracy across the three repeated sessions (p > 0.05).
No significant differences in accuracy were observed across question types for either model (p > 0.05).
Across disciplines, Deepseek-R1 showed no significant differences across disciplines (p > 0.05), whereas ChatGPT-5.4 exhibited significant cross-disciplinary differences in Papers I, II, and III (p < 0.05).

Interpretation:

Deepseek-R1 demonstrated greater overall accuracy and consistency compared to ChatGPT-5.4.

Limitations:

Theuseofpubliclyavailablehistoricalquestionslimitsconclusionsaboutthemodels'genuinereasoningability.

Conclusion:

Both Deepseek-R1 and ChatGPT-5.4 showed strong performance and reproducibility, with Deepseek-R1 outperforming interns and ChatGPT-5.4.

Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns

Objective:

Approach:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns

Related Content

AI Model Predicts Cancer Treatment Response

An IHC-derived TLS–CD8–macrophage immune niche score predicts major pathological response to neoadjuvant chemoimmunotherapy in resectable NSCLC

Untargeted or Non-Targeted Screening?