Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians

Category	Detail
Condition	Benign Prostatic Hyperplasia (BPH)
Key Mechanisms	Use of Large Language Models (LLMs) and chatbots to generate clinical communication responses; evaluation of accuracy, completeness, and tone in patient-physician communication
Target Population	Males over 50 years with BPH
Care Setting	Urology outpatient and perioperative clinical communication

Chatbots like ChatGPT and specialized versions (KPGPT, SurgiChat) can generate accurate, comprehensive, and empathetic responses to common BPH patient questions.
Sandbox environments enable secure testing of AI tools without risking patient health information (PHI) exposure.
Evaluation involved real-world patient questions, expert-generated answer keys, and blinded assessments by urologists and patient volunteers.

Utilize expert-generated standardized answer keys to assess accuracy of AI and physician responses to patient inquiries.

Incorporate AI chatbots as adjunct tools for delivering real-time, personalized information to patients with BPH.
Ensure chatbot responses are specific and reference authoritative sources when applicable.

Conduct blinded evaluations of AI and physician communications for accuracy, completeness, and tone using Likert scales.
Engage both subject matter experts and representative patient populations in assessing communication quality.

Avoid use of AI tools that process PHI outside secure healthcare ecosystems to prevent data breaches and legal liabilities.
Implement sandbox environments to mitigate risks associated with AI integration in clinical settings.

Male patients aged 50 years and older with BPH, including those with prior treatment experience

Patients show preference for clear, accurate, and empathetic communication; AI chatbots can meet these needs when properly integrated and evaluated.

Use sandbox environments for AI chatbot testing to protect patient data privacy and security.
Develop and utilize expert-validated answer keys to benchmark AI and physician communication accuracy.
Engage multidisciplinary evaluators including clinicians and patient representatives to assess communication tools.
Truncate non-professional disclaimers in chatbot responses during blinded evaluations to maintain assessment objectivity.
Leverage Retrieval-Augmented Generation (RAG) to enhance chatbot responses with authoritative literature.

Clinical Scorecard: Comparing Accuracy, Completeness, and Preferences of Physician and AI-Generated Communications in Urology: Insights from Patients and Doctors