Grounded report generation for enhancing ophthalmic ultrasound interpretation using Vision-Language Segmentation models - Report - MDSpire

Grounded report generation for enhancing ophthalmic ultrasound interpretation using Vision-Language Segmentation models

  • By

  • Kai Jin

  • Qixuan Sun

  • Daohuan Kang

  • Ziyao Luo

  • Tao Yu

  • Wenzheng Han

  • Yi Zhang

  • Meng Wang

  • Danli Shi

  • Andrzej Grzybowski

  • January 3, 2026

  • 0 min

Share

Improving Ophthalmic Ultrasound Analysis with Vision-Language Segmentation Models

Overview

This study presents a novel AI model integrating Vision-Language Segmentation (VLS) and the Segment Anything Model (SAM) to generate comprehensive diagnostic reports and precise lesion annotations from ophthalmic ultrasound images. Utilizing large datasets from multiple hospitals, the model demonstrated superior report generation, higher diagnostic accuracy, and reduced reporting time compared to traditional methods.

Background

Ophthalmic ultrasound is essential for diagnosing and managing various eye conditions, including retinal diseases and ocular tumors. However, interpreting these images is time-consuming and requires specialized expertise, which is challenged by increasing data volumes. Traditional AI models have improved image classification but lack the ability to generate detailed, interpretable reports. Recent advances in Vision-Language Models (VLM) and segmentation techniques offer promising avenues to enhance diagnostic precision and report generation in ophthalmology.

Data Highlights

DatasetPatientsImagesReportsMean Age (years)Gender Distribution (Male %)
Training549737,91712,649~49.547.4%
Validation191512,6394197~49.747.4%
Test191912,6404170~49.647.4%
External Test Set 1 (FAHWM)26974226950.840.1%
External Test Set 2 (FAHZC)701607057.445.7%
Total967064,09821,355

Key Findings

  • The integrated VLS model combining Vision-Language Models and SAM achieved superior performance in generating detailed ophthalmic ultrasound reports compared to baseline VL models.
  • AI-assisted reporting significantly improved diagnostic accuracy and reduced the time required for report generation.
  • The model effectively annotated lesions on images, enhancing interpretability and clinical utility.
  • Clinical evaluation by senior and junior ophthalmologists confirmed the model's effectiveness in real-world diagnostic settings.
  • The approach demonstrated scalability and potential applicability beyond ophthalmology to other medical imaging domains.

Clinical Implications

The integration of VLS and SAM in ophthalmic ultrasound analysis offers a practical tool to augment clinician workflow by providing accurate, interpretable reports and lesion annotations. This can reduce diagnostic workload and improve decision-making efficiency. Adoption of such AI-assisted systems may enhance patient care by enabling timely and precise diagnosis across diverse clinical settings.

Conclusion

This study demonstrates that combining advanced vision-language segmentation models with precise lesion annotation significantly advances ophthalmic ultrasound interpretation. The approach holds promise for broader application in medical imaging diagnostics, facilitating improved clinical outcomes through AI-augmented reporting.

References

  1. Author/Source/Year -- Improving Ophthalmic Ultrasound Analysis through Grounded Report Generation with Vision-Language Segmentation Models

Original Source(s)

Related Content