Assessing retina-specific ophthalmic counseling generated by an early public large language model across different levels of clinical urgency - Report - MDSpire

Assessing retina-specific ophthalmic counseling generated by an early public large language model across different levels of clinical urgency

  • By

  • Dominic M. Choo

  • Tyler A. Durham

  • Kishan G. Patel

  • July 1, 2026

  • 0 min

Share

Clinical Report: Evaluating the Quality of Retina-Focused Ophthalmic Guidance

Overview

This study evaluates the quality of ophthalmic counseling generated by a large language model (LLM) across varying clinical urgency levels for retinal conditions. Findings indicate that counseling accuracy varies significantly with clinical urgency, particularly for retinal detachment scenarios.

Background

The use of large language models (LLMs) in healthcare is growing, particularly for patient education and counseling. This study specifically addresses how LLMs perform in delivering counseling for retinal conditions with differing urgency levels.

Data Highlights

Outcome MeasureResult
Counseling AccuracyVaried by urgency (p = 0.002)
High vs Low Urgency AMDNo significant difference (p = 0.081)
High vs Low Urgency DRNo significant difference (p = 0.5)
High vs Low Urgency RDSignificant difference (p < 0.001)
Readability RequirementCollege graduation level needed
Common Understanding IssuesToo much medical terminology (49%), non-medical terminology (45%)

Key Findings

  • Counseling accuracy differed significantly across clinical urgency levels (p = 0.002).
  • High-urgency AMD and RD scenarios showed significant discrepancies in counseling urgency (p = 0.013 and p < 0.001, respectively).
  • Empathy in counseling did not vary significantly across urgency levels (p = 0.2).
  • Readability assessments indicated that all outputs required a college graduation level to understand.
  • Common barriers to understanding included excessive medical and non-medical terminology.

Clinical Implications

Healthcare professionals should be aware of the limitations in the accuracy and readability of LLM-generated counseling, particularly in high-urgency scenarios.

Conclusion

The study reveals that while LLMs can provide counseling for retinal conditions, their performance varies significantly with clinical urgency.

Related Resources & Content

  1. American Academy of Ophthalmology, AAO, 2025 -- Current U.S. guideline references for retina care
  2. Eye, Performance of large language models for ophthalmic literature retrieval, 2026
  3. Frontiers in Medicine, Benchmark evaluation of multi-modal large language models for ophthalmic diagnosis in real world, 2026
  4. npj Digital Medicine, Comparative Analysis of Diagnostic and Triage Efficacy Between Large Language Models and Healthcare Professionals, 2026
  5. Intravitreal Aflibercept 8 mg in Neovascular Age-Related Macular Degeneration: Ninety-Six-Week Results from the Randomized Phase 3 PULSAR Trial, 2025
  6. npj Digital Medicine — Improving Privacy-Respecting Deployable Large Language Models for Detecting Perioperative Complications: A Focused Approach Using LoRA Fine-Tuning
  7. Posterior Vitreous Detachment, Retinal Breaks, and Lattice Degeneration Preferred Practice Pattern® - Ophthalmology
  8. Intravitreal Aflibercept 8 mg in Neovascular Age-Related Macular Degeneration: Ninety-Six-Week Results from the Randomized Phase 3 PULSAR Trial - ScienceDirect
  9. Comparative evaluation of large language model–based AI chatbots in solving ophthalmological clinical case vignettes | Eye

Original Source(s)

Related Content