Rethinking Annotation Granularity for Overcoming Shortcuts in Deep Learning-based Radiograph Diagnosis: A Multicenter Study

Radiol Artif Intell. 2022 Jul 20;4(5):e210299. doi: 10.1148/ryai.210299. eCollection 2022 Sep.

Abstract

Purpose: To evaluate the ability of fine-grained annotations to overcome shortcut learning in deep learning (DL)-based diagnosis using chest radiographs.

Materials and methods: Two DL models were developed using radiograph-level annotations (disease present: yes or no) and fine-grained lesion-level annotations (lesion bounding boxes), respectively named CheXNet and CheXDet. A total of 34 501 chest radiographs obtained from January 2005 to September 2019 were retrospectively collected and annotated regarding cardiomegaly, pleural effusion, mass, nodule, pneumonia, pneumothorax, tuberculosis, fracture, and aortic calcification. The internal classification performance and lesion localization performance of the models were compared on a testing set (n = 2922); external classification performance was compared on National Institutes of Health (NIH) Google (n = 4376) and PadChest (n = 24 536) datasets; and external lesion localization performance was compared on the NIH ChestX-ray14 dataset (n = 880). The models were also compared with radiologist performance on a subset of the internal testing set (n = 496). Performance was evaluated using receiver operating characteristic (ROC) curve analysis.

Results: Given sufficient training data, both models performed similarly to radiologists. CheXDet achieved significant improvement for external classification, such as classifying fracture on NIH Google (CheXDet area under the ROC curve [AUC], 0.67; CheXNet AUC, 0.51; P < .001) and PadChest (CheXDet AUC, 0.78; CheXNet AUC, 0.55; P < .001). CheXDet achieved higher lesion detection performance than CheXNet for most abnormalities on all datasets, such as detecting pneumothorax on the internal set (CheXDet jackknife alternative free-response ROC [JAFROC] figure of merit [FOM], 0.87; CheXNet JAFROC FOM, 0.13; P < .001) and NIH ChestX-ray14 (CheXDet JAFROC FOM, 0.55; CheXNet JAFROC FOM, 0.04; P < .001).

Conclusion: Fine-grained annotations overcame shortcut learning and enabled DL models to identify correct lesion patterns, improving the generalizability of the models.Keywords: Computer-aided Diagnosis, Conventional Radiography, Convolutional Neural Network (CNN), Deep Learning Algorithms, Machine Learning Algorithms, Localization Supplemental material is available for this article © RSNA, 2022.

Keywords: Computer-aided Diagnosis; Conventional Radiography; Convolutional Neural Network (CNN); Deep Learning Algorithms; Localization; Machine Learning Algorithms.