Assessing retina-specific ophthalmic counseling generated by an early public large language model across different levels of clinical urgency - Summary - MDSpire

Assessing retina-specific ophthalmic counseling generated by an early public large language model across different levels of clinical urgency

  • By

  • Dominic M. Choo

  • Tyler A. Durham

  • Kishan G. Patel

  • July 1, 2026

  • 0 min

Share

Objective:

To evaluate how the quality of retina-specific ophthalmology counseling provided by an early publicly available large language model (LLM) differs when advising patients with varying clinical characteristics and risk factors.

Approach:
  • Study Design: Prospective, cross-sectional study involving 18 ophthalmologists rating LLM-generated counseling based on six patient vignettes with varying clinical urgencies.
  • Vignette Creation: Six clinical vignettes were constructed representing high- and low-urgency scenarios for diabetic retinopathy, retinal detachment, and age-related macular degeneration.
  • LLM Interaction: The LLM (ChatGPT-3.5) was prompted to provide medical counseling based on the vignettes, with responses rated by independent reviewers.
  • Readability Assessment: Readability of the counseling outputs was evaluated using qualitative surveys and quantitative metrics.
Key Findings:
  • Counseling accuracy varied with clinical urgency (p = 0.002), particularly for retinal detachment (p < 0.001).
  • Counseling urgency did not significantly differ from clinical urgency in most vignettes, except for high-urgency AMD (p = 0.013) and RD (p < 0.001).
  • Empathy in counseling did not differ across clinical urgency (p = 0.2).
  • Readability assessments indicated that college graduation was required to understand all counseling outputs.
  • Common reasons for difficulty in understanding included excessive medical (49%) and non-medical (45%) terminology.
Interpretation:

The LLM-generated counseling outputs were largely similar across retinal vignettes with differing clinical urgency, indicating a need for optimization in LLM prompting for better accuracy and readability.

Limitations:
  • The study was limited to a single LLM version and a small sample of ophthalmologists, which may affect the generalizability of the findings.
  • Readability assessments may not fully capture the nuances of patient understanding.
Conclusion:

Future studies should investigate the optimization of LLM prompting to improve counseling accuracy, readability, empathy, and communication of urgency for specific conditions.

Original Source(s)

Related Content