Benchmark evaluation of multi-modal large language models for ophthalmic diagnosis in real world - Takeaways - MDSpire

Benchmark evaluation of multi-modal large language models for ophthalmic diagnosis in real world

By
Shoujun Huang
Junhong Chen
Jiaoman Wang
Ping Zhang
Wending Du
Yuan Hong
Dexing Kong
Wei Lou
Mingying Lai
Weihua Yang
June 22, 2026
0 min

Frontiers In Medicine

Share

1

The study evaluated nine leading multimodal large language models (MLLMs) for their diagnostic capabilities in ophthalmology using a curated dataset.
2

A benchmark dataset of 295 pathologically confirmed ophthalmic cases was created, integrating clinical narratives and medical images.
3

Models like HAIBU-ReMUD and ChatGPT-4o demonstrated strong diagnostic accuracy, with performance nearing that of human experts.
4

The evaluation focused on open-ended clinical question answering, multimodal information integration, and natural language reasoning.
5

The dataset included diverse ophthalmic cases, ensuring broad coverage across various subspecialties and excluding cases with insufficient data.

Original Source(s)

Frontiers In Medicine

Benchmark evaluation of multi-modal large language models for ophthalmic diagnosis in real world

by Shoujun Huang, Junhong Chen, Jiaoman Wang, Ping Zhang, Wending Du, Yuan Hong, Dexing Kong, Wei Lou, Mingying Lai, Weihua Yang
June 22, 2026

Related Content

The Ophthalmologist

New CEO for Optos

Optos appoints Alexandre Montague as Chief Executive Officer

June 26, 2026
2 min

Frontiers In Ophthalmology

Teleophthalmology for triage and management of corneal pathologies at vision centres: a prospective observational service evaluation study in north India

Bmc Ophthalmology

Diagnostic and therapeutic impact of PCR in uveitis: real-world data from intraocular fluid analysis of 45 uveitis patients in a tertiary referral center in Turkey