Development and Evaluation of a Deep Learning Algorithm for Rib Segmentation and Fracture Detection from Multicenter Chest CT Images

Mingxiang Wu; Zhizhong Chai; Guangwu Qian; Huangjing Lin; Qiong Wang; Liansheng Wang; Hao Chen

doi:10.1148/ryai.2021200248

Development and Evaluation of a Deep Learning Algorithm for Rib Segmentation and Fracture Detection from Multicenter Chest CT Images

Radiol Artif Intell. 2021 Jul 21;3(5):e200248. doi: 10.1148/ryai.2021200248. eCollection 2021 Sep.

Authors

Mingxiang Wu¹, Zhizhong Chai¹, Guangwu Qian¹, Huangjing Lin¹, Qiong Wang¹, Liansheng Wang¹, Hao Chen¹

Affiliation

¹ Department of Radiology, Shenzhen People's Hospital, Luohu, China (M.W.); AI Research Laboratory, Imsight Technology, Nanshan, China (Z.C., H.L.); Peng Cheng Laboratory, Nanshan, China (G.Q.); Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China (Q.W.); Department of Computer Science, School of Informatics, Xiamen University, Xiamen, China (L.W.); and Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong (H.C.).

Abstract

Purpose: To evaluate the performance of a deep learning-based algorithm for automatic detection and labeling of rib fractures from multicenter chest CT images.

Materials and methods: This retrospective study included 10 943 patients (mean age, 55 years; 6418 men) from six hospitals (January 1, 2017 to December 30, 2019), which consisted of patients with and without rib fractures who underwent CT. The patients were separated into one training set (n = 2425), two lesion-level test sets (n = 362 and 105), and one examination-level test set (n = 8051). Free-response receiver operating characteristic (FROC) score (mean sensitivity of seven different false-positive rates), precision, sensitivity, and F1 score were used as metrics to assess rib fracture detection performance. Area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were employed to evaluate the classification accuracy. The mean Dice coefficient and accuracy were used to assess the performance of rib labeling.

Results: In the detection of rib fractures, the model showed an FROC score of 84.3% on test set 1. For test set 2, the algorithm achieved a detection performance (precision, 82.2%; sensitivity, 84.9%; F1 score, 83.3%) comparable to three radiologists (precision, 81.7%, 98.0%, 92.0%; sensitivity, 91.2%, 78.6%, 69.2%; F1 score, 86.1%, 87.2%, 78.9%). When the radiologists used the algorithm, the mean sensitivity of the three radiologists showed an improvement (from 79.7% to 89.2%), with precision achieving similar performance (from 90.6% to 88.4%). Furthermore, the model achieved an AUC of 0.93 (95% CI: 0.91, 0.94), sensitivity of 87.9% (95% CI: 83.7%, 91.4%), and specificity of 85.3% (95% CI: 74.6%, 89.8%) on test set 3. On a subset of test set 1, the model achieved a Dice score of 0.827 with an accuracy of 96.0% for rib segmentation.

Conclusion: The developed deep learning algorithm was capable of detecting rib fractures, as well as corresponding anatomic locations on CT images.Keywords CT, Ribs© RSNA, 2021.

Keywords: CT; Ribs.

2021 by the Radiological Society of North America, Inc.