To evaluate the reliability and accessibility of health information provided by large language models (LLMs) for pediatric anesthesia, with the aim of informing the integration of AI-driven tools into perioperative education.
Approach:
Study Design: A cross-sectional observational analysis was conducted, following STROBE and CHART guidelines, assessing 72 responses from four LLM-based chatbots to pediatric anesthesia questions.
Data Collection: Standardized search terms related to pediatric anesthesia were identified and used to generate prompts for the chatbots, reflecting real-world caregiver concerns.
Key Findings:
The study evaluated responses from ChatGPT, Claude, DeepSeek, and Gemini regarding pediatric anesthesia.
Responses were assessed for readability, informational reliability, and overall quality, revealing variability in performance across the models.
Interpretation:
The findings indicate a need for accurate and comprehensible information from LLMs to support caregivers in pediatric anesthesia contexts.
Limitations:
The study did not perform a formal statistical sample size calculation due to its observational nature.
Responses were limited to predefined prompts and may not encompass all caregiver concerns.
Conclusion:
The study highlights the necessity of evaluating LLM-generated health information for pediatric anesthesia to ensure it is safe and clear for caregivers.