Improving Ophthalmic Ultrasound Analysis with Vision-Language Segmentation Models
Overview
This study presents a novel AI model integrating Vision-Language Segmentation (VLS) and the Segment Anything Model (SAM) to generate comprehensive diagnostic reports and precise lesion annotations from ophthalmic ultrasound images. Utilizing large datasets from multiple hospitals, the model demonstrated superior report generation, higher diagnostic accuracy, and reduced reporting time compared to traditional methods.
Background
Ophthalmic ultrasound is essential for diagnosing and managing various eye conditions, including retinal diseases and ocular tumors. However, interpreting these images is time-consuming and requires specialized expertise, which is challenged by increasing data volumes. Traditional AI models have improved image classification but lack the ability to generate detailed, interpretable reports. Recent advances in Vision-Language Models (VLM) and segmentation techniques offer promising avenues to enhance diagnostic precision and report generation in ophthalmology.
Data Highlights
Dataset
Patients
Images
Reports
Mean Age (years)
Gender Distribution (Male %)
Training
5497
37,917
12,649
~49.5
47.4%
Validation
1915
12,639
4197
~49.7
47.4%
Test
1919
12,640
4170
~49.6
47.4%
External Test Set 1 (FAHWM)
269
742
269
50.8
40.1%
External Test Set 2 (FAHZC)
70
160
70
57.4
45.7%
Total
9670
64,098
21,355
Key Findings
The integrated VLS model combining Vision-Language Models and SAM achieved superior performance in generating detailed ophthalmic ultrasound reports compared to baseline VL models.
AI-assisted reporting significantly improved diagnostic accuracy and reduced the time required for report generation.
The model effectively annotated lesions on images, enhancing interpretability and clinical utility.
Clinical evaluation by senior and junior ophthalmologists confirmed the model's effectiveness in real-world diagnostic settings.
The approach demonstrated scalability and potential applicability beyond ophthalmology to other medical imaging domains.
Clinical Implications
The integration of VLS and SAM in ophthalmic ultrasound analysis offers a practical tool to augment clinician workflow by providing accurate, interpretable reports and lesion annotations. This can reduce diagnostic workload and improve decision-making efficiency. Adoption of such AI-assisted systems may enhance patient care by enabling timely and precise diagnosis across diverse clinical settings.
Conclusion
This study demonstrates that combining advanced vision-language segmentation models with precise lesion annotation significantly advances ophthalmic ultrasound interpretation. The approach holds promise for broader application in medical imaging diagnostics, facilitating improved clinical outcomes through AI-augmented reporting.
References
Author/Source/Year -- Improving Ophthalmic Ultrasound Analysis through Grounded Report Generation with Vision-Language Segmentation Models