Diagnostic performance of multimodal ultrasound-based deep learning models in differentiating benign and malignant thyroid nodules - Summary - MDSpire

Diagnostic performance of multimodal ultrasound-based deep learning models in differentiating benign and malignant thyroid nodules

  • By

  • Huajie Ding

  • Lei Na

  • Meiling Hao

  • Wanlou Chen

  • Zhen Zhang

  • June 29, 2026

  • 0 min

Share

Objective:

To explore the performance of different deep learning models, including ResNet50, DenseNet121, VGG16, and GoogLeNet, for differentiating benign and malignant thyroid nodules based on multimodal ultrasound images.

Approach:
  • Data Collection: 15,373 multimodal ultrasound images, including B-mode, superb microvascular imaging (SMI), and shear-wave elastography (SWE), were divided into training (N = 11,530) and validation (N = 3,843) cohorts.
Key Findings:
  • ResNet50 achieved the highest diagnostic performance (AUC: 0.931), indicating strong capability in distinguishing between benign and malignant nodules.
  • DenseNet121, VGG16, and GoogLeNet had AUCs of 0.857, 0.846, and 0.811, respectively, showing varying levels of diagnostic performance.
  • ResNet50's performance was statistically superior to other models (all P < 0.001).
  • ResNet50's accuracy (0.871) was better than junior radiologists (0.810) and comparable to intermediate radiologists (0.886), but lower than senior radiologists (0.946).
Interpretation:

The multimodal ultrasound-based deep learning models demonstrate satisfactory performance in differentiating benign and malignant thyroid nodules, with ResNet50 showing the highest performance.

Limitations:
  • The study is retrospective and may have inherent biases that could affect the results.
  • The performance of models was not evaluated in a real-world clinical setting, which may limit generalizability.
Conclusion:

The multimodal ultrasound-based deep learning models achieve satisfactory performance in differentiating benign and malignant thyroid nodules, with ResNet50 demonstrating the highest performance among the evaluated models.

Original Source(s)

Related Content