Clinical Report: Comparing Physician and AI-Generated Communications in Urology
Overview
This study evaluated the accuracy, completeness, tone, and patient preference of answers to common benign prostatic hyperplasia (BPH) questions generated by four urologists and two AI chatbots within a secure sandbox environment. Both AI chatbots, including a retrieval-augmented model, demonstrated reliable performance comparable to physicians in providing accurate and comprehensive responses. Patient and expert evaluations highlighted the potential of AI tools to support clinical communication in urology.
Background
Artificial intelligence, particularly large language models (LLMs), is increasingly integrated into healthcare to enhance physician-patient communication. Chatbots like ChatGPT can provide complex medical information but raise concerns about privacy and data security when used outside controlled environments. Benign prostatic hyperplasia (BPH), a prevalent urologic condition affecting men over 50, serves as an ideal test case for chatbot evaluation due to its high message volume and clinical relevance. This study leveraged a sandbox environment to securely test AI-generated responses to real-world patient questions across the BPH care continuum.
Data Highlights
Twenty common BPH-related patient questions were answered by four board-certified urologists and two AI chatbots (Kaiser Permanente GPT and SurgiChat). Responses were evaluated by two urologist subject matter experts and five male volunteers aged 56–82 for accuracy, completeness, tone, and preference using Likert scales. The sandbox environment ensured no patient health information was exposed during testing. Chatbot answers were provided with prompts to be specific and incorporate applied sources, with some disclaimers removed to maintain evaluator blinding.
Key Findings
Both AI chatbots produced answers with accuracy and completeness comparable to those of experienced urologists based on expert grading.
The retrieval-augmented chatbot (SurgiChat) leveraged authoritative BPH literature to enhance response quality within the sandbox environment.
Patient volunteers rated chatbot responses favorably in terms of tone and clarity, indicating good acceptance of AI-generated communications.
Use of a secure sandbox environment allowed robust testing of AI tools without risking patient data privacy or security.
Chatbots demonstrated the ability to handle open-ended, personalized, and patient-specific questions across the perioperative BPH care spectrum.
Clinical Implications
AI chatbots, when integrated within secure healthcare environments, can reliably support physician-patient communication by providing accurate, comprehensive, and empathetic information on common urologic conditions like BPH. Their use may enhance patient education and engagement while reducing clinician workload in managing routine inquiries. However, careful implementation with attention to data security and clinical oversight remains essential.
Conclusion
This study provides early evidence that AI chatbots can effectively complement physician communication in urology by delivering accurate and complete information in a patient-centered manner. Secure sandbox testing frameworks enable safe evaluation and future integration of such technologies into clinical practice.
References
OpenAI/ChatGPT/2023 -- ChatGPT Medical Applications
STROBE Statement/2007 -- Guidelines for Observational Studies
Kaiser Permanente GPT and SurgiChat/2024 -- AI Chatbots in Urology Sandbox Study
by Eric J. Robinson, Chunyuan Qiu, Stuart Sands, Mohammad Khan, Shivang Vora, Kenichiro Oshima, Khang Nguyen, L. Andrew DiFronzo, David Rhew, Mark I. Feng
Board-certified urologist Yvonne K. P. Koch, M.D., has joined Baptist Health Urology. She specializes in general urology and male and female voiding dysfunction.