Head-to-head evaluation of ChatGPT, DeepSeek, and Perplexity on acid–base disorder case clinical management and drug treatment: Accuracy, domain performance, and response consistency assessment

By
Moteb Khobrani
Asaad Ahmed Asaad Khalil
Salman Ashfaq Ahmad
Azfar Athar Ishaqui
June 8, 2026
0 min

Digital Health

Objective:

To evaluate the performance of ChatGPT, DeepSeek, and Perplexity on a standardized set of acid-base disturbance cases, focusing on accuracy, domain-specific performance, and consistency of responses, highlighting the significance of these evaluations in medical education.

Key Findings:

Performance of LLMs varied across different acid-base disturbance cases, with specific metrics indicating varying levels of accuracy.
Accuracy and consistency of responses were evaluated across 510 multiple-choice questions, revealing distinct performance patterns.
Subgroup analysis classified cases into metabolic, respiratory, or mixed acid-base disorders, providing insights into model strengths and weaknesses.

Interpretation:

The study highlights the need for context-specific benchmarking of LLMs in medical education, particularly in complex domains like acid-base disorders, and suggests that improved accuracy could enhance learner outcomes.

Limitations:

The dataset was derived from a single source, which may limit generalizability and introduce bias.
Original clinical case vignettes were not reproduced verbatim due to copyright restrictions, potentially affecting the authenticity of the assessment.

Conclusion:

The study provides insights into the comparative performance of LLMs in managing acid-base disorders, emphasizing the importance of accuracy and consistency in clinical contexts.

Head-to-head evaluation of ChatGPT, DeepSeek, and Perplexity on acid–base disorder case clinical management and drug treatment: Accuracy, domain performance, and response consistency assessment

Objective:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Head-to-head evaluation of ChatGPT, DeepSeek, and Perplexity on acid–base disorder case clinical management and drug treatment: Accuracy, domain performance, and response consistency assessment

Related Content

FDA Approves Cefepime-Zidebactam Injection

Top 10 States With the Most Favorable Tax Climate for Physicians

Celiac Tied to Higher Transplant Rate