Grounded report generation for enhancing ophthalmic ultrasound interpretation using Vision-Language Segmentation models

By
Kai Jin
Qixuan Sun
Daohuan Kang
Ziyao Luo
Tao Yu
Wenzheng Han
Yi Zhang
Meng Wang
Danli Shi
Andrzej Grzybowski
January 3, 2026
0 min

Npj Digital Medicine

At a Glance

Category	Detail
Condition	Ocular conditions including retinal diseases and ocular tumors
Key Mechanisms	Integration of Vision-Language Models (VLM) with Segment Anything Model (SAM) for image segmentation and report generation
Target Population	Patients undergoing ophthalmic ultrasound imaging across multiple hospitals
Care Setting	Ophthalmology clinical settings utilizing ultrasound diagnostics

Key Highlights

Novel AI model combining visual understanding and natural language processing to generate comprehensive diagnostic reports with lesion annotations.
Use of Visual-Language Segmentation (VLS) model and SAM enables precise lesion segmentation and interpretable report generation.
AI-assisted ocular ultrasound reporting improves diagnostic accuracy and reduces reporting time, validated by senior and junior ophthalmologists.

Guideline-Based Recommendations

Diagnosis

Utilize ophthalmic ultrasound imaging for detailed structural assessment of ocular conditions.
Incorporate AI models that combine image analysis with natural language report generation to enhance diagnostic precision.

Management

Adopt AI-assisted reporting tools to support clinical decision-making and personalized patient care.
Leverage lesion segmentation outputs to guide treatment planning and monitoring.

Monitoring & Follow-up

Use AI-generated reports to track disease progression and response to therapy over time.
Regularly evaluate AI model performance and update with clinical feedback to maintain reliability.

Risks

Be aware of challenges in model interpretability and reliability in clinical settings.
Ensure AI outputs are reviewed by qualified ophthalmologists to prevent misinterpretation.

Patient & Prescribing Data

9670 patients across three hospitals with diverse ocular conditions, balanced gender distribution, and mean age around 50 years.

AI-assisted ultrasound reporting demonstrated higher diagnostic accuracy and significantly reduced reporting time, supporting its use as an auxiliary diagnostic tool.

Clinical Best Practices

Combine advanced Vision-Language Models with segmentation techniques for comprehensive ophthalmic ultrasound analysis.
Engage both senior and junior ophthalmologists in evaluating AI-generated reports to ensure clinical relevance.
Integrate AI tools into existing workflows to manage increasing ultrasound data volume efficiently.
Continuously validate AI model performance using diverse, real-world datasets to ensure generalizability.

Grounded report generation for enhancing ophthalmic ultrasound interpretation using Vision-Language Segmentation models

Clinical Scorecard: Improving Ophthalmic Ultrasound Analysis through Grounded Report Generation with Vision-Language Segmentation Models

At a Glance

Key Highlights

Guideline-Based Recommendations

Diagnosis

Management

Monitoring & Follow-up

Risks

Patient & Prescribing Data

Clinical Best Practices

References

Original Source(s)

Grounded report generation for enhancing ophthalmic ultrasound interpretation using Vision-Language Segmentation models

Related Content

Magnetic resonance enterography to predict subsequent disabling Crohn’s disease in newly diagnosed patients (METRIC-EF)—multivariable prediction model, multicentre diagnostic inception cohort

How has the diagnostic approach to parathyroid localization techniques evolved in the past decade? Insights from a single-center experience

Bridging radiology and pathology: domain-generalized cross-modal learning for clinical applications