To explore the performance of different deep learning models, including ResNet50, DenseNet121, VGG16, and GoogLeNet, for differentiating benign and malignant thyroid nodules based on multimodal ultrasound images.
Approach:
Data Collection: 15,373 multimodal ultrasound images, including B-mode, superb microvascular imaging (SMI), and shear-wave elastography (SWE), were divided into training (N = 11,530) and validation (N = 3,843) cohorts.
Key Findings:
ResNet50 achieved the highest diagnostic performance (AUC: 0.931), indicating strong capability in distinguishing between benign and malignant nodules.
DenseNet121, VGG16, and GoogLeNet had AUCs of 0.857, 0.846, and 0.811, respectively, showing varying levels of diagnostic performance.
ResNet50's performance was statistically superior to other models (all P < 0.001).
ResNet50's accuracy (0.871) was better than junior radiologists (0.810) and comparable to intermediate radiologists (0.886), but lower than senior radiologists (0.946).
Interpretation:
The multimodal ultrasound-based deep learning models demonstrate satisfactory performance in differentiating benign and malignant thyroid nodules, with ResNet50 showing the highest performance.
Limitations:
The study is retrospective and may have inherent biases that could affect the results.
The performance of models was not evaluated in a real-world clinical setting, which may limit generalizability.
Conclusion:
The multimodal ultrasound-based deep learning models achieve satisfactory performance in differentiating benign and malignant thyroid nodules, with ResNet50 demonstrating the highest performance among the evaluated models.