Comparison of diagnostic performance of the current score-based ultrasound risk stratification systems according to thyroid nodule size

Quant Imaging Med Surg. 2024 Dec 5;14(12):9234-9245. doi: 10.21037/qims-24-282. Epub 2024 Nov 6.

Abstract

Background: The lack of standardization in risk stratification systems (RSSs) has led to uncertainty in selecting the most effective RSS for diagnosing malignancy risk in thyroid nodules. Therefore, the aim of this study was to compare the diagnostic performance of four current score-based RSSs according to thyroid nodule size, with the goal of determining the most effective RSS and aiding in clinical decision-making.

Methods: Between July 2013 and January 2019, a total of 2,667 consecutive patients presenting with 3,944 thyroid nodules were pathologically diagnosed after thyroidectomy and/or ultrasound (US)-guided fine-needle aspiration (FNA). These nodules were retrospectively dichotomized into two groups: small nodules (<1 cm) and large nodules (≥1 cm). The four RSSs were used to assign US categories, and the diagnostic performances were computed and compared based on the size of thyroid nodules, both before and after the application of size thresholds for biopsy.

Results: After thyroidectomy or biopsy, 1,781 (45.2%) thyroid nodules were found to be malignant. (I) After applying size thresholds for biopsy in ≥1 cm nodules, the highest specificity, accuracy, area under the curve (AUC) and the lowest FNA rate and unnecessary FNA rate were observed in the Artificial Intelligence-Thyroid Imaging Reporting And Data System (AI-TIRADS) (66.1%, 75.3%, 0.785, 55.1%, and 38.6%, respectively, P<0.05 for all). (II) Before applying size thresholds for biopsy in ≥1 cm nodules, the FNA rate and unnecessary FNA rate of the four RSSs were lower they were after the application of the size threshold: American College of Radiology Thyroid Imaging Reporting and Data System (ACR-TIRADS), 59.1% versus 61.4%, 39.8% versus 45.4%; AI-TIRADS, 52.3% versus 55.1%, 34.0% versus 38.6%; TIRADS issued by Kwak et al. (Kwak-TIRADS), 52.5% versus 76.1%, 34.4% versus 52.1%; Chinese Thyroid Imaging Reporting and Data System (C-TIRADS), 51.5% versus 66.2%, 34.4% versus 50.1% (P<0.05 for all). (III) The small nodules showed higher sensitivity and lower specificity than the large nodules (ACR-TIRADS, 97.7% versus 95.5%, 46.2% versus 62.5%; AI-TIRADS, 97.2% versus 92.7%, 49.9% versus 71.6%; Kwak-TIRADS, 97.2% versus 92.5%, 49.7% versus 71.3%; C-TIRADS, 94.2% versus 90.7%, 55.0% versus 71.8%, respectively, all P<0.05).

Conclusions: A potential effective strategy for managing large nodules in the current score-based RSSs could be to rely solely on US categories rather than size thresholds for biopsy. Additionally, the diagnostic performance of small nodules showed higher sensitivity and lower specificity compared to large nodules before applying size thresholds for biopsy. These findings suggest a possible new management strategy for large nodules and provide a basis for the managing small nodules.

Keywords: Ultrasonography; fine-needle aspiration (FNA); risk stratification system (RSS); thyroid nodule.