Cautious Artificial Intelligence Improves Outcomes and Trust by Flagging Outlier Cases

Abhiraj S Kanse; Nikhil C Kurian; Himanshu P Aswani; Zakia Khan; Peter H Gann; Swapnil Rane; Amit Sethi

doi:10.1200/CCI.22.00067

Cautious Artificial Intelligence Improves Outcomes and Trust by Flagging Outlier Cases

JCO Clin Cancer Inform. 2022 Oct:6:e2200067. doi: 10.1200/CCI.22.00067.

Authors

Abhiraj S Kanse¹, Nikhil C Kurian¹, Himanshu P Aswani¹, Zakia Khan², Peter H Gann³, Swapnil Rane⁴, Amit Sethi¹

Affiliations

¹ Department of Electrical Engineering Indian Institute of Technology Bombay, Mumbai, India.
² Independent Researcher, Palatine, IL.
³ Department of Pathology, University of Illinois College of Medicine, Chicago, IL.
⁴ Department of Pathology, Tata Memorial Centre-ACTREC, HBNI, Navi Mumbai, India.

PMID: 36228179
DOI: 10.1200/CCI.22.00067

Abstract

Purpose: Artificial intelligence (AI) models for medical image diagnosis are often trained and validated on curated data. However, in a clinical setting, images that are outliers with respect to the training data, such as those representing rare disease conditions or acquired using a slightly different setup, can lead to wrong decisions. It is not practical to expect clinicians to be trained to discount results for such outlier images. Toward clinical deployment, we have designed a method to train cautious AI that can automatically flag outlier cases.

Materials and methods: Our method-ClassClust-forms tight clusters of training images using supervised contrastive learning, which helps it identify outliers during testing. We compared ClassClust's ability to detect outliers with three competing methods on four publicly available data sets covering pathology, dermatoscopy, and radiology. We held out certain diseases, artifacts, and types of images from training data and examined the ability of various models to detect these as outliers during testing. We compared the decision accuracy of the models on held-out nonoutlier images also. We visualized the regions of the images that the models used for their decisions.

Results: Area under receiver operating characteristic curve for outlier detection was consistently higher using ClassClust compared with the previous methods. Average accuracy on held-out nonoutlier images was also higher, and the visualizations of image regions were more informative using ClassClust.

Conclusion: The ability to flag outlier test cases need not be at odds with the ability to accurately classify nonoutliers in AI models. Although the latter capability has received research and regulatory attention, AI models for clinical deployment should possess the former as well.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence*
Data Collection
Humans
ROC Curve
Trust*