To evaluate the performance of MedSigLIP, a domain-specific medical encoder, against standard ImageNet-pretrained models in thyroid cytology classification, highlighting the significance of this comparison.
Key Findings:
EfficientNet achieved the highest macro-F1 score (0.845 ± 0.021), followed by MedSigLIP (0.836 ± 0.019), indicating the competitive nature of these models in thyroid cytology classification.
MedSigLIP demonstrated the highest recall for the Suspicious class (0.808) and the best calibration score (ECE = 0.025), suggesting its potential for improving diagnostic accuracy.
The difference in performance between EfficientNet and MedSigLIP was not statistically significant after multiple comparison correction, indicating that while MedSigLIP is promising, it may not yet surpass established models.
Interpretation:
While MedSigLIP did not outperform EfficientNet in aggregate accuracy, it showed superior calibration and sensitivity for borderline cases, suggesting its potential utility in clinical workflows, particularly in reducing misclassification rates.
Limitations:
The study's findings are based on a single dataset and may not generalize to other populations, highlighting the need for further research.
Further prospective validation in real-world triage workflows is needed to confirm these findings and assess their applicability.
Conclusion:
Encoder selection for thyroid cytology should prioritize calibration and sensitivity for borderline cases over aggregate accuracy, with MedSigLIP showing promise in reducing overconfident misclassification in Bethesda V cases, emphasizing the importance of calibration in clinical decision-making.