Benchmark evaluation of multi-modal large language models for ophthalmic diagnosis in real world - Summary - MDSpire

Benchmark evaluation of multi-modal large language models for ophthalmic diagnosis in real world

By
Shoujun Huang
Junhong Chen
Jiaoman Wang
Ping Zhang
Wending Du
Yuan Hong
Dexing Kong
Wei Lou
Mingying Lai
Weihua Yang
June 22, 2026
0 min

Frontiers In Medicine

Share

Objective:

To evaluate the diagnostic capabilities of multi-modal large language models (MLLMs) in ophthalmology using a curated benchmark dataset.

Approach:

Key Findings:

Models such as HAIBU-ReMUD and ChatGPT-4o achieved strong diagnostic accuracy and consistency.
Performance of some models approached that of human experts in specific settings.

Interpretation:

Limitations:

The evaluation focused on models not primarily optimized for ophthalmology.
Performance may vary across different clinical contexts and specific ophthalmic tasks.

Conclusion:

The study provides a foundation for further exploration of MLLMs in ophthalmic diagnosis.

Original Source(s)

Frontiers In Medicine

Benchmark evaluation of multi-modal large language models for ophthalmic diagnosis in real world

by Shoujun Huang, Junhong Chen, Jiaoman Wang, Ping Zhang, Wending Du, Yuan Hong, Dexing Kong, Wei Lou, Mingying Lai, Weihua Yang
June 22, 2026

Related Content

Frontiers In Medicine

Acute retinal necrosis presenting exudative retinal detachment: a case report

by Han Wang, Ying Zhu, Ai Xuan Cheng, Chao Zhang
June 24, 2026

Bmc Ophthalmology

Diagnostic and therapeutic impact of PCR in uveitis: real-world data from intraocular fluid analysis of 45 uveitis patients in a tertiary referral center in Turkey

Frontiers In Immunology

PANoptosis in diabetic retinopathy: immunological insights into mechanisms and translational therapies

by Lingli Ma, Ning Hou, Xiaoyu Zhao, Zimeng Li, Qing Liu, Jianyu Zhao, Qing Wang
June 19, 2026