Assessing retina-specific ophthalmic counseling generated by an early public large language model across different levels of clinical urgency

By
Dominic M. Choo
Tyler A. Durham
Kishan G. Patel
July 1, 2026
0 min

Frontiers In Digital Health

Objective:

To evaluate how the quality of retina-specific ophthalmology counseling provided by an early publicly available large language model (LLM) differs when advising patients with varying clinical characteristics and risk factors.

Approach:

Study Design: Prospective, cross-sectional study involving 18 ophthalmologists rating LLM-generated counseling based on six patient vignettes with varying clinical urgencies.
Vignette Creation: Six clinical vignettes were constructed representing high- and low-urgency scenarios for diabetic retinopathy, retinal detachment, and age-related macular degeneration.
LLM Interaction: The LLM (ChatGPT-3.5) was prompted to provide medical counseling based on the vignettes, with responses rated by independent reviewers.
Readability Assessment: Readability of the counseling outputs was evaluated using qualitative surveys and quantitative metrics.

Key Findings:

Counseling accuracy varied with clinical urgency (p = 0.002), particularly for retinal detachment (p < 0.001).
Counseling urgency did not significantly differ from clinical urgency in most vignettes, except for high-urgency AMD (p = 0.013) and RD (p < 0.001).
Empathy in counseling did not differ across clinical urgency (p = 0.2).
Readability assessments indicated that college graduation was required to understand all counseling outputs.
Common reasons for difficulty in understanding included excessive medical (49%) and non-medical (45%) terminology.

Interpretation:

The LLM-generated counseling outputs were largely similar across retinal vignettes with differing clinical urgency, indicating a need for optimization in LLM prompting for better accuracy and readability.

Limitations:

The study was limited to a single LLM version and a small sample of ophthalmologists, which may affect the generalizability of the findings.
Readability assessments may not fully capture the nuances of patient understanding.

Conclusion:

Future studies should investigate the optimization of LLM prompting to improve counseling accuracy, readability, empathy, and communication of urgency for specific conditions.

Assessing retina-specific ophthalmic counseling generated by an early public large language model across different levels of clinical urgency

Objective:

Approach:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Assessing retina-specific ophthalmic counseling generated by an early public large language model across different levels of clinical urgency

Related Content

Neurovascular unit uncoupling in diabetic retinopathy: molecular mechanisms and stage-adapted therapeutic strategies

Surgical Roundtable Case 3: Fixing a Tilted Secondary IOL

Surgical Roundtable Case 1: Managing Giant Retinal Tears