Assessment of image quality on the diagnostic performance of clinicians and deep learning models: Cross-sectional comparative reader study

J Eur Acad Dermatol Venereol. 2024 Dec 10. doi: 10.1111/jdv.20462. Online ahead of print.

Abstract

Background: Skin cancer is a prevalent and clinically significant condition, with early and accurate diagnosis being crucial for improved patient outcomes. Dermoscopy and artificial intelligence (AI) hold promise in enhancing diagnostic accuracy. However, the impact of image quality, particularly high dynamic range (HDR) conversion in smartphone images, on diagnostic performance remains poorly understood.

Objective: This study aimed to investigate the effect of varying image qualities, including HDR-enhanced dermoscopic images, on the diagnostic capabilities of clinicians and a convolutional neural network (CNN) model.

Methods: Eighteen dermatology clinicians assessed 303 images of 101 skin lesions that were categorized into three image quality groups: low quality (LQ), high quality (HQ) and enhanced quality (EQ) produced using HDR-style conversion. Clinicians participated in a two part reader study that required their diagnosis, management and confidence level for each image assessed.

Results: In the binary classification of lesions, clinicians had the greatest diagnostic performance with HQ images, with sensitivity (77.3%; CI 69.1-85.5), specificity (63.1%; CI 53.7-72.5) and accuracy (70.2%; CI 61.3-79.1). For the multiclass classification, the overall performance was also best with HQ images, attaining the greatest specificity (91.9%; CI 83.2-95.0) and accuracy (51.5%; CI 48.4-54.7). Clinicians had a superior performance (median correct diagnoses) to the CNN model for the binary classification of LQ and EQ images, but their performance was comparable on the HQ images. However, in the multiclass classification, the CNN model significantly outperformed the clinicians on HQ images (p < 0.01).

Conclusion: This study highlights the importance of image quality on the diagnostic performance of clinicians and deep learning models. This has significant implications for telehealth reporting and triage.