Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns - Scorecard - MDSpire

Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns

By
Zhili Niu
Dongling Tang
Juanjuan Chen
Pingan Zhang
Chengliang Zhu
June 19, 2026
0 min

Frontiers In Digital Health

Share

Clinical Scorecard: Evaluation of Deepseek-R1 and ChatGPT-5.4 Performance in the Medical Laboratory Junior Professional Title Examination: A Comparison of Accuracy, Consistency, and Intern Results

At a Glance

Category	Detail
Condition	Medical Laboratory Junior Professional Title Examination
Key Mechanisms	Evaluation of AI models' accuracy and reproducibility in examination settings.
Target Population	Final-year medical laboratory interns and AI models.
Care Setting	Medical education and examination preparation.

Key Highlights

Deepseek-R1 outperformed ChatGPT-5.4 in accuracy across most examination papers.
Both AI models demonstrated strong reproducibility with Fleiss' kappa coefficients exceeding 0.7.
Interns performed comparably to AI models only on Paper I, scoring lower on others.
ChatGPT-5.4 exhibited significant cross-disciplinary differences in performance.
Stable knowledge gaps were identified through analysis of error types.

Guideline-Based Recommendations

Diagnosis

Management

Monitoring & Follow-up

Risks

Patient & Prescribing Data

Not applicable; study focused on AI models and interns.

AI models may serve as auxiliary tools for examination preparation.

Clinical Best Practices

Utilize AI models for personalized learning support in medical education.
Incorporate AI performance evaluations in the assessment of medical knowledge.

Related Resources & Content

Study on AI performance in medical exams

Original Source(s)

Frontiers In Digital Health

Performance of deepseek-R1 and ChatGPT-5.4 thinking in the medical laboratory professional title examination: accuracy, stability, and comparison with interns

by Zhili Niu, Dongling Tang, Juanjuan Chen, Pingan Zhang, Chengliang Zhu
June 19, 2026

Related Content

The Analytical Scientist

Mass Spec Roundup: Speed, Scale, and Sharper Spectra

Fast chemistry, twisted light, scaled-up interactomics, and cleaner native spectra broaden the reach of mass spectrometry

June 17, 2026
7 min

Bmc Infectious Diseases

Correction: Clinical performance of a dual-target SARS CoV-2 antibody assay using sera from Ghana

Frontiers In Immunology

Nurr1 deficiency orchestrates a coupled liver–gut pathological axis revealed by multi-omics and deep-learning histopathology