Head-to-head evaluation of ChatGPT, DeepSeek, and Perplexity on acid–base disorder case clinical management and drug treatment: Accuracy, domain performance, and response consistency assessment

By
Moteb Khobrani
Asaad Ahmed Asaad Khalil
Salman Ashfaq Ahmad
Azfar Athar Ishaqui
June 8, 2026
0 min

Digital Health

Overview

This study evaluates the performance of three large language models (LLMs)—ChatGPT, DeepSeek, and Perplexity—in managing acid-base disorders through a comprehensive analysis of 75 clinical cases. The findings highlight significant differences in accuracy, consistency, and domain-specific performance among the models.

Background

Acid-base disorders are prevalent in various medical fields and can indicate life-threatening conditions. Accurate interpretation and management of these disorders are crucial for patient safety and effective treatment. As LLMs are increasingly utilized in healthcare, understanding their capabilities and limitations in this specific domain is essential for their integration into clinical practice.

Data Highlights

Model	Overall Accuracy	Consistency
ChatGPT	85%	78%
DeepSeek	90%	82%
Perplexity	80%	75%

Key Findings

DeepSeek demonstrated the highest overall accuracy at 90% across the acid-base cases.
ChatGPT and Perplexity showed lower accuracy rates at 85% and 80%, respectively.
Consistency of responses varied, with DeepSeek achieving 82% and ChatGPT 78%.
Performance varied significantly across different interpretive steps, indicating domain-specific strengths and weaknesses.
All models exhibited a tendency to hallucinate, emphasizing the need for cautious application in clinical settings.

Clinical Implications

Healthcare professionals should be aware of the varying performance levels of LLMs when utilizing them for acid-base disorder management. While these models can assist in educational contexts, reliance on their outputs without verification may pose risks to patient safety.

Conclusion

The comparative analysis underscores the potential of LLMs in clinical education and decision-making, while also highlighting the necessity for careful evaluation and validation of their outputs in practice.

Head-to-head evaluation of ChatGPT, DeepSeek, and Perplexity on acid–base disorder case clinical management and drug treatment: Accuracy, domain performance, and response consistency assessment

Clinical Report: Comparative Analysis of ChatGPT, DeepSeek, and Perplexity in Managing Acid-Base Disorders

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

Related Resources & Content

Original Source(s)

Head-to-head evaluation of ChatGPT, DeepSeek, and Perplexity on acid–base disorder case clinical management and drug treatment: Accuracy, domain performance, and response consistency assessment

Related Content

Uric Acid Stone Dissolution Peaked at pH 7.2

Long-Term Effects of Dietary Protein on Kidney Function in Patients with Chronic Kidney Disease Not Undergoing Dialysis

Top 10 Clinical Guidance Updates for Physicians