Performance of large language models for ophthalmic literature retrieval

By
Jai Paris
Oliver Kleinig
Ayushi Agarwal
Weng Onn Chan
Dinesh Selva
June 12, 2026
0 min

At a Glance

Category	Detail
Condition
Key Mechanisms	Evaluation of large language models (LLMs) for literature search and retrieval.
Target Population
Care Setting

Key Highlights

LLMs demonstrated low recall (0.16–0.41) but high precision (0.78–1.00).
ChatGPT Deep Research achieved the highest mean recall (0.41) and F1 score (0.56).
Performance varied by topic, with higher recall for rarer topics (0.29–0.76).
LLM searches did not identify additional studies beyond manual searches.
Hallucinated citations were essentially absent.

Guideline-Based Recommendations

Diagnosis

Management

Monitoring & Follow-up

Risks

Patient & Prescribing Data

Not specified

LLMs are high-precision tools for rapid literature scope.

Clinical Best Practices

Use LLMs for quick orientation within unfamiliar evidence bases.
Remain cautious of LLM limitations for comprehensive study identification.

Related Resources & Content

Source Article

Original Source(s)

Eye

Performance of large language models for ophthalmic literature retrieval

by Jai Paris, Oliver Kleinig, Ayushi Agarwal, Weng Onn Chan, Dinesh Selva
June 12, 2026

Related Content

Frontiers In Immunology

Ophthalmic involvement in VEXAS syndrome and its influence on mortality: insights from the international AIDA network registry

Optometric Management

Keratoconus Care Continues to Evolve

Susan Gromacki, OD, MS, FAAO, FSLS, Dipl AAO, and Clark Chang, OD, FAAO, FSLS, reviewed the latest developments in diagnosis, imaging, contact lenses, corneal cross-linking (CXL), and surgical management for KC.

by Veronica Daub
June 17, 2026
4 min

Frontiers In Endocrinology

Performance of large language models for ophthalmic literature retrieval

Clinical Scorecard: Evaluation of Large Language Models in Retrieving Ophthalmic Literature

At a Glance

Key Highlights

Guideline-Based Recommendations

Diagnosis

Management

Monitoring & Follow-up

Risks

Patient & Prescribing Data

Clinical Best Practices

Related Resources & Content

Original Source(s)

Performance of large language models for ophthalmic literature retrieval

Related Content

Ophthalmic involvement in VEXAS syndrome and its influence on mortality: insights from the international AIDA network registry

Keratoconus Care Continues to Evolve

A machine learning model for diabetic retinopathy risk stratification using routine blood and urine parameters: insights into kidney-eye crosstalk