Artificial intelligence (AI) vs. human in hip fracture detection

Nattaphon Twinprai; Artit Boonrod; Arunnit Boonrod; Jarin Chindaprasirt; Wichien Sirithanaphol; Prinya Chindaprasirt; Prin Twinprai

doi:10.1016/j.heliyon.2022.e11266

Artificial intelligence (AI) vs. human in hip fracture detection

Heliyon. 2022 Oct 27;8(11):e11266. doi: 10.1016/j.heliyon.2022.e11266. eCollection 2022 Nov.

Authors

Nattaphon Twinprai¹, Artit Boonrod², Arunnit Boonrod³, Jarin Chindaprasirt⁴, Wichien Sirithanaphol⁵, Prinya Chindaprasirt⁶, Prin Twinprai⁷

Affiliations

¹ Trauma Unit, Department of Orthopedics, Srinagarind Hospital, Khon Kaen University, Thailand.
² Sport Unit, Department of Orthopedics, Srinagarind Hospital, Khon Kaen University, Thailand.
³ Neurology Unit, Department of Radiology, Srinagarind Hospital, Khon Kaen University, Thailand.
⁴ Department of Internal Medicine, Srinagarind Hospital, Khon Kaen University, Thailand.
⁵ Department of Surgery, Srinagarind Hospital, Khon Kaen University, Thailand.
⁶ Sustainable Infrastructure Research and Development Center, Department of Civil Engineering, Faculty of Engineering, Khon Kaen University, Thailand.
⁷ Musculoskeletal Unit, Department of Radiology, Srinagarind Hospital, Khon Kaen University, Thailand.

Abstract

Objective: This study aimed to assess the diagnostic accuracy and sensitivity of a YOLOv4-tiny AI model for detecting and classifying hip fractures types.

Materials and methods: In this retrospective study, a dataset of 1000 hip and pelvic radiographs was divided into a training set consisting of 450 fracture and 450 normal images (900 images total) and a testing set consisting of 50 fracture and 50 normal images (100 images total). The training set images were each manually augmented with a bounding box drawn around each hip, and each bounding box was manually labeled either (1) normal, (2) femoral neck fracture, (3) intertrochanteric fracture, or (4) subtrochanteric fracture. Next, a deep convolutional neural network YOLOv4-tiny AI model was trained using the augmented training set images, and then model performance was evaluated with the testing set images. Human doctors then evaluated the same testing set images, and the performances of the model and doctors were compared. The testing set contained no crossover data.

Results: The resulting output images revealed that the AI model produced bounding boxes around each hip region and classified the fracture and normal hip regions with a sensitivity of 96.2%, specificity of 94.6%, and an accuracy of 95%. The human doctors performed with a sensitivity ranging from 69.2 to 96.2%. Compared with human doctors, the detection rate sensitivity of the model was significantly better than a general practitioner and first-year residents and equivalent to specialist doctors.

Conclusions: This model showed hip fracture detection sensitivity comparable to well-trained radiologists and orthopedists and classified hip fractures highly accurately.

Keywords: Artificial intelligence; Computer vision; Deep learning; Hip fracture; Trauma.