Large language model chatbots as sources of pediatric anesthesia health advice: An evaluation of reliability and readability

By
Xue Zhang
Yuchen Dai
Xin Zhao
Lin Wu
Boming Shao
Xisheng Shan
Fuhai Ji
Runzhi Deng
Baojian Zhao
June 29, 2026
0 min

Digital Health

Objective:

To evaluate the reliability and accessibility of health information provided by large language models (LLMs) for pediatric anesthesia, with the aim of informing the integration of AI-driven tools into perioperative education.

Approach:

Study Design: A cross-sectional observational analysis was conducted, following STROBE and CHART guidelines, assessing 72 responses from four LLM-based chatbots to pediatric anesthesia questions.
Data Collection: Standardized search terms related to pediatric anesthesia were identified and used to generate prompts for the chatbots, reflecting real-world caregiver concerns.

Key Findings:

The study evaluated responses from ChatGPT, Claude, DeepSeek, and Gemini regarding pediatric anesthesia.
Responses were assessed for readability, informational reliability, and overall quality, revealing variability in performance across the models.

Interpretation:

The findings indicate a need for accurate and comprehensible information from LLMs to support caregivers in pediatric anesthesia contexts.

Limitations:

The study did not perform a formal statistical sample size calculation due to its observational nature.
Responses were limited to predefined prompts and may not encompass all caregiver concerns.

Conclusion:

The study highlights the necessity of evaluating LLM-generated health information for pediatric anesthesia to ensure it is safe and clear for caregivers.

Large language model chatbots as sources of pediatric anesthesia health advice: An evaluation of reliability and readability

Objective:

Approach:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Large language model chatbots as sources of pediatric anesthesia health advice: An evaluation of reliability and readability

Related Content

Dipyridamole combined with immunoglobulin and aspirin in the treatment of Kawasaki disease in children: a meta-analysis

Reduced circulating mitochondrial DNA integrity and increased DNA oxidation in preclinical and clinical pediatric obesity: an observational study

Evaluating the accuracy and communication quality of large language models in Ewing sarcoma: a comparative analysis of ChatGPT, Claude, Gemini, DeepSeek, and Grok