Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models - Summary - MDSpire

Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models

  • By

  • David Pompili

  • Yasmina Richa

  • Patrick Collins

  • Helen Richards

  • Derek B Hennessey

  • July 29, 2024

  • 0 min

Share

Objective:

To explore the ability of multiple mainstream LLMs to generate accurate patient information leaflets (PILs) on urological topics and assess their readability, with a focus on comparing their performance.

Key Findings:
  • PaLM 2 generated the highest average quality score (3.58) for PILs, followed by Llama 2 (3.34) and ChatGPT-4 (3.08). No statistically significant differences in quality scores were observed between the PILs generated by the LLMs for the assessed topics, indicating a need for further investigation.
Interpretation:

The study indicates that LLMs, particularly PaLM 2, can produce high-quality patient information leaflets, which may enhance patient understanding of urological conditions.

Limitations:
  • The study focused on only three LLMs and four urological topics, limiting the generalizability of the findings. Additionally, the evaluation was conducted by a panel of clinicians, which may introduce bias in scoring and affect the reliability of the results.
Conclusion:

LLMs show promise in generating patient information leaflets that are both accurate and accessible, with PaLM 2 performing particularly well in this context. Future research should explore a broader range of LLMs and topics to validate these findings.

Original Source(s)

Related Content