Initial Assessment of MedSigLIP in Thyroid Cytology: A Comparative Benchmark with Frozen Encoders Versus ImageNet-Pretrained Models

By
Mehmet Poyrazer
Rıdvan Erten
April 10, 2026

Frontiers In Endocrinology

Overview

This study compares domain-specific MedSigLIP and general ImageNet-pretrained models for thyroid FNAB cytology classification. While EfficientNet achieved the highest overall accuracy, MedSigLIP demonstrated superior calibration and sensitivity for the challenging Bethesda V category, suggesting benefits in clinical triage.

Background

Thyroid nodules are common, with fine-needle aspiration biopsy (FNAB) cytology serving as the primary diagnostic tool to stratify malignancy risk. The Bethesda System classifies cytology into categories ranging from benign to malignant, but indeterminate categories, especially Bethesda V (Suspicious for Malignancy), pose diagnostic challenges due to subjective interpretation and interobserver variability. Deep learning models pretrained on natural images are commonly used for classification, but domain-specific medical encoders like MedSigLIP may offer improved performance and reliability in this specialized context.

Data Highlights

Model	Macro-F1 (mean ± SD)	Recall Bethesda V	Expected Calibration Error (ECE)
EfficientNet-B0	0.845 ± 0.021	Not specified	0.044–0.082 (range for general encoders)
MedSigLIP	0.836 ± 0.019	0.808	0.025
ResNet50	0.829 ± 0.015	Not specified	0.044–0.082
ViT-Base	0.817 ± 0.020	Not specified	0.044–0.082

Key Findings

EfficientNet-B0 achieved the highest macro-F1 score (0.845), statistically outperforming ViT but not MedSigLIP.
MedSigLIP showed the highest recall (0.808) for the challenging Bethesda V (Suspicious) category.
MedSigLIP had the best calibration with the lowest Expected Calibration Error (ECE = 0.025) compared to general-purpose encoders (ECE range 0.044–0.082).
No statistically significant difference in overall classification accuracy was found between MedSigLIP and EfficientNet after correction for multiple comparisons.
Model calibration and sensitivity for borderline cases are critical metrics beyond aggregate accuracy for clinical utility.

Clinical Implications

In thyroid cytology workflows, selecting AI models should prioritize not only accuracy but also calibration and sensitivity for indeterminate Bethesda V cases to reduce overconfident misclassification. Well-calibrated models like MedSigLIP may enhance triage decisions and enable selective expert review, potentially improving patient management. Prospective validation in real-world clinical settings is needed to confirm these benefits.

Conclusion

MedSigLIP, a domain-specific medical pretrained encoder, offers improved calibration and sensitivity for suspicious thyroid cytology cases without compromising overall accuracy compared to ImageNet-pretrained models. These attributes support its potential role in enhancing clinical decision support for thyroid nodule evaluation.

Related Resources & Content

Google Health AI Developer Foundations/2024 -- MedSigLIP: Medical Image-Text Pretrained Encoder
ThyroidEffi 1.0 Dataset/2024 -- Benchmark Dataset for Thyroid Cytology Classification

Initial Assessment of MedSigLIP in Thyroid Cytology: A Comparative Benchmark with Frozen Encoders Versus ImageNet-Pretrained Models

Benchmarking MedSigLIP vs ImageNet Models in Thyroid Cytology Classification

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

Related Resources & Content

Original Source(s)

An early evaluation of MedSigLIP in thyroid cytology: a comparative frozen-encoder benchmark against ImageNet-pretrained encoders

Related Content

Herbal targeting of mitochondrial dynamic proteins in low-density lipoprotein driven atherosclerosis in vitro

Editorial: Molecular characterization of thyroid lesions in the era of “next generation” techniques, volume III

Immunohistochemical testing of GISTs using CD117 markers: the UK NEQAS ICC & ISH external quality assessment data show significant differences in the performance of methods in regular use