Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models - Scorecard - MDSpire

Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models

By
David Pompili
Yasmina Richa
Patrick Collins
Helen Richards
Derek B Hennessey
July 29, 2024
0 min

World Journal Of Urology

Share

Clinical Scorecard: Evaluating Three Large Language Models for Generating Urology Patient Literature Using Artificial Intelligence

At a Glance

Category	Detail
Condition	Common urological surgeries and conditions including circumcision, nephrectomy, overactive bladder syndrome (OAB), and transurethral resection of the prostate (TURP)
Key Mechanisms	Use of large language models (ChatGPT-4, PaLM 2, Llama 2) to generate patient information leaflets (PILs) with medically accurate, understandable content tailored for laypersons
Target Population	Patients undergoing common urological procedures or with urological conditions requiring accessible educational materials
Care Setting	Urology clinical settings involving patient education and pre/post-operative care

Key Highlights

PaLM 2 generated PILs had the highest overall quality scores, followed by Llama 2 and ChatGPT-4.
PILs were evaluated on 20 quality criteria by a blinded panel of urology clinicians using a 5-point Likert scale.
Readability of PILs was assessed using an average of seven validated readability formulas to ensure accessibility for patients with varying literacy levels.

Guideline-Based Recommendations

Diagnosis

Not applicable—study focuses on patient information generation rather than diagnostic criteria.

Management

Patient information leaflets should include all benefits, risks, and potential complications of procedures.
Information should describe pre- and post-operative expectations and encourage active patient participation in care.

Monitoring & Follow-up

No direct monitoring recommendations; however, quality and readability of patient materials should be regularly evaluated.

Risks

Ensure medical accuracy to avoid misinformation.
Tailor content to be understandable to laypersons to reduce confusion or anxiety.

Patient & Prescribing Data

Patients undergoing circumcision, nephrectomy, OAB treatment, or TURP

LLM-generated patient information leaflets can support patient understanding and engagement but vary in quality depending on the model used.

Clinical Best Practices

Use comprehensive, guideline-based prompts when generating patient educational materials with AI.
Employ multidisciplinary clinician panels to evaluate the quality and accuracy of AI-generated content.
Assess readability using multiple validated formulas to ensure materials are accessible to patients with diverse literacy levels.
Incorporate clear explanations of procedure benefits, risks, and patient roles in care to optimize outcomes.

References

Readability Formulas Calculator

Original Source(s)

World Journal Of Urology

Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models

by David Pompili, Yasmina Richa, Patrick Collins, Helen Richards, Derek B Hennessey
July 29, 2024

Related Content

World Journal Of Urology

Evaluating transurethral resection of the prostate over twenty years: a systematic review and meta-analysis of randomized clinical trials

World Journal Of Urology

SMART Stone Multidisciplinary Team (MDT) and patient care: recommendations for the adult high-risk kidney stone patient pathway

World Journal Of Urology

Single-port robotic-assisted radical prostatectomy: evaluating the Da Vinci SP system in minimally invasive urologic oncology

by Kirolos Eskandar
November 21, 2025