Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians - Scorecard - MDSpire

Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians

  • By

  • Eric J. Robinson

  • Chunyuan Qiu

  • Stuart Sands

  • Mohammad Khan

  • Shivang Vora

  • Kenichiro Oshima

  • Khang Nguyen

  • L. Andrew DiFronzo

  • David Rhew

  • Mark I. Feng

  • December 27, 2024

  • 0 min

Share

Clinical Scorecard: Comparing Accuracy, Completeness, and Preferences of Physician and AI-Generated Communications in Urology: Insights from Patients and Doctors

At a Glance

CategoryDetail
ConditionBenign Prostatic Hyperplasia (BPH)
Key MechanismsUse of Large Language Models (LLMs) and chatbots to generate clinical communication responses; evaluation of accuracy, completeness, and tone in patient-physician communication
Target PopulationMales over 50 years with BPH
Care SettingUrology outpatient and perioperative clinical communication

Key Highlights

  • Chatbots like ChatGPT and specialized versions (KPGPT, SurgiChat) can generate accurate, comprehensive, and empathetic responses to common BPH patient questions.
  • Sandbox environments enable secure testing of AI tools without risking patient health information (PHI) exposure.
  • Evaluation involved real-world patient questions, expert-generated answer keys, and blinded assessments by urologists and patient volunteers.

Guideline-Based Recommendations

Diagnosis

  • Utilize expert-generated standardized answer keys to assess accuracy of AI and physician responses to patient inquiries.

Management

  • Incorporate AI chatbots as adjunct tools for delivering real-time, personalized information to patients with BPH.
  • Ensure chatbot responses are specific and reference authoritative sources when applicable.

Monitoring & Follow-up

  • Conduct blinded evaluations of AI and physician communications for accuracy, completeness, and tone using Likert scales.
  • Engage both subject matter experts and representative patient populations in assessing communication quality.

Risks

  • Avoid use of AI tools that process PHI outside secure healthcare ecosystems to prevent data breaches and legal liabilities.
  • Implement sandbox environments to mitigate risks associated with AI integration in clinical settings.

Patient & Prescribing Data

Male patients aged 50 years and older with BPH, including those with prior treatment experience

Patients show preference for clear, accurate, and empathetic communication; AI chatbots can meet these needs when properly integrated and evaluated.

Clinical Best Practices

  • Use sandbox environments for AI chatbot testing to protect patient data privacy and security.
  • Develop and utilize expert-validated answer keys to benchmark AI and physician communication accuracy.
  • Engage multidisciplinary evaluators including clinicians and patient representatives to assess communication tools.
  • Truncate non-professional disclaimers in chatbot responses during blinded evaluations to maintain assessment objectivity.
  • Leverage Retrieval-Augmented Generation (RAG) to enhance chatbot responses with authoritative literature.

References

Original Source(s)

Related Content