Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models - Scorecard - MDSpire

Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models

  • By

  • David Pompili

  • Yasmina Richa

  • Patrick Collins

  • Helen Richards

  • Derek B Hennessey

  • July 29, 2024

  • 0 min

Share

Clinical Scorecard: Evaluating Three Large Language Models for Generating Urology Patient Literature Using Artificial Intelligence

At a Glance

CategoryDetail
ConditionCommon urological surgeries and conditions including circumcision, nephrectomy, overactive bladder syndrome (OAB), and transurethral resection of the prostate (TURP)
Key MechanismsUse of large language models (ChatGPT-4, PaLM 2, Llama 2) to generate patient information leaflets (PILs) with medically accurate, understandable content tailored for laypersons
Target PopulationPatients undergoing common urological procedures or with urological conditions requiring accessible educational materials
Care SettingUrology clinical settings involving patient education and pre/post-operative care

Key Highlights

  • PaLM 2 generated PILs had the highest overall quality scores, followed by Llama 2 and ChatGPT-4.
  • PILs were evaluated on 20 quality criteria by a blinded panel of urology clinicians using a 5-point Likert scale.
  • Readability of PILs was assessed using an average of seven validated readability formulas to ensure accessibility for patients with varying literacy levels.

Guideline-Based Recommendations

Diagnosis

  • Not applicable—study focuses on patient information generation rather than diagnostic criteria.

Management

  • Patient information leaflets should include all benefits, risks, and potential complications of procedures.
  • Information should describe pre- and post-operative expectations and encourage active patient participation in care.

Monitoring & Follow-up

  • No direct monitoring recommendations; however, quality and readability of patient materials should be regularly evaluated.

Risks

  • Ensure medical accuracy to avoid misinformation.
  • Tailor content to be understandable to laypersons to reduce confusion or anxiety.

Patient & Prescribing Data

Patients undergoing circumcision, nephrectomy, OAB treatment, or TURP

LLM-generated patient information leaflets can support patient understanding and engagement but vary in quality depending on the model used.

Clinical Best Practices

  • Use comprehensive, guideline-based prompts when generating patient educational materials with AI.
  • Employ multidisciplinary clinician panels to evaluate the quality and accuracy of AI-generated content.
  • Assess readability using multiple validated formulas to ensure materials are accessible to patients with diverse literacy levels.
  • Incorporate clear explanations of procedure benefits, risks, and patient roles in care to optimize outcomes.

References

Original Source(s)

Related Content