Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns - Takeaways - MDSpire

Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns

  • By

  • Zhili Niu

  • Dongling Tang

  • Juanjuan Chen

  • Pingan Zhang

  • Chengliang Zhu

  • June 19, 2026

  • 0 min

Share

  • 1

    The study evaluated the accuracy and reproducibility of Deepseek-R1 and ChatGPT-5.4 in the Medical Laboratory Junior Professional Title Examination.

  • 2

    Both AI models demonstrated good reproducibility with Fleiss' kappa coefficients exceeding 0.7, indicating stable performance.

  • 3

    Deepseek-R1 outperformed ChatGPT-5.4 in accuracy across most examination papers, particularly in Papers I, II, and III.

  • 4

    Interns performed comparably to AI models only on Paper I, scoring significantly lower on Papers II, III, and IV.

  • 5

    Deepseek-R1 showed superior overall performance and greater disciplinary consistency compared to ChatGPT-5.4.

Original Source(s)

Related Content