Clinical Scorecard: Comparing Accuracy, Completeness, and Preferences of Physician and AI-Generated Communications in Urology: Insights from Patients and Doctors
At a Glance
Category
Detail
Condition
Benign Prostatic Hyperplasia (BPH)
Key Mechanisms
Use of Large Language Models (LLMs) and chatbots to generate clinical communication responses; evaluation of accuracy, completeness, and tone in patient-physician communication
Target Population
Males over 50 years with BPH
Care Setting
Urology outpatient and perioperative clinical communication
Key Highlights
Chatbots like ChatGPT and specialized versions (KPGPT, SurgiChat) can generate accurate, comprehensive, and empathetic responses to common BPH patient questions.
Sandbox environments enable secure testing of AI tools without risking patient health information (PHI) exposure.
Evaluation involved real-world patient questions, expert-generated answer keys, and blinded assessments by urologists and patient volunteers.
Guideline-Based Recommendations
Diagnosis
Utilize expert-generated standardized answer keys to assess accuracy of AI and physician responses to patient inquiries.
Management
Incorporate AI chatbots as adjunct tools for delivering real-time, personalized information to patients with BPH.
Ensure chatbot responses are specific and reference authoritative sources when applicable.
Monitoring & Follow-up
Conduct blinded evaluations of AI and physician communications for accuracy, completeness, and tone using Likert scales.
Engage both subject matter experts and representative patient populations in assessing communication quality.
Risks
Avoid use of AI tools that process PHI outside secure healthcare ecosystems to prevent data breaches and legal liabilities.
Implement sandbox environments to mitigate risks associated with AI integration in clinical settings.
Patient & Prescribing Data
Male patients aged 50 years and older with BPH, including those with prior treatment experience
Patients show preference for clear, accurate, and empathetic communication; AI chatbots can meet these needs when properly integrated and evaluated.
Clinical Best Practices
Use sandbox environments for AI chatbot testing to protect patient data privacy and security.
Develop and utilize expert-validated answer keys to benchmark AI and physician communication accuracy.
Engage multidisciplinary evaluators including clinicians and patient representatives to assess communication tools.
Truncate non-professional disclaimers in chatbot responses during blinded evaluations to maintain assessment objectivity.
Leverage Retrieval-Augmented Generation (RAG) to enhance chatbot responses with authoritative literature.
by Eric J. Robinson, Chunyuan Qiu, Stuart Sands, Mohammad Khan, Shivang Vora, Kenichiro Oshima, Khang Nguyen, L. Andrew DiFronzo, David Rhew, Mark I. Feng