Deep learning diagnostic performance and visual insights in differentiating benign and malignant thyroid nodules on ultrasound images

Exp Biol Med (Maywood). 2023 Dec;248(24):2538-2546. doi: 10.1177/15353702231220664. Epub 2024 Jan 26.

Abstract

This study aims to construct and evaluate a deep learning model, utilizing ultrasound images, to accurately differentiate benign and malignant thyroid nodules. The objective includes visualizing the model's process for interpretability and comparing its diagnostic precision with a cohort of 80 radiologists. We employed ResNet as the classification backbone for thyroid nodule prediction. The model was trained using 2096 ultrasound images of 655 distinct thyroid nodules. For performance evaluation, an independent test set comprising 100 cases of thyroid nodules was curated. In addition, to demonstrate the superiority of the artificial intelligence (AI) model over radiologists, a Turing test was conducted with 80 radiologists of varying clinical experience. This was meant to assess which group of radiologists' conclusions were in closer alignment with AI predictions. Furthermore, to highlight the interpretability of the AI model, gradient-weighted class activation mapping (Grad-CAM) was employed to visualize the model's areas of focus during its prediction process. In this cohort, AI diagnostics demonstrated a sensitivity of 81.67%, a specificity of 60%, and an overall diagnostic accuracy of 73%. In comparison, the panel of radiologists on average exhibited a diagnostic accuracy of 62.9%. The AI's diagnostic process was significantly faster than that of the radiologists. The generated heat-maps highlighted the model's focus on areas characterized by calcification, solid echo and higher echo intensity, suggesting these areas might be indicative of malignant thyroid nodules. Our study supports the notion that deep learning can be a valuable diagnostic tool with comparable accuracy to experienced senior radiologists in the diagnosis of malignant thyroid nodules. The interpretability of the AI model's process suggests that it could be clinically meaningful. Further studies are necessary to improve diagnostic accuracy and support auxiliary diagnoses in primary care settings.

Keywords: AI interpretability; Grad-CAM; ResNet; Thyroid nodules; deep learning; diagnostic accuracy; ultrasound images.

MeSH terms

  • Artificial Intelligence
  • Deep Learning*
  • Humans
  • Retrospective Studies
  • Thyroid Nodule* / diagnostic imaging
  • Thyroid Nodule* / pathology
  • Ultrasonography / methods