Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images

Achim Hekler; Jochen S Utikal; Alexander H Enk; Wiebke Solass; Max Schmitt; Joachim Klode; Dirk Schadendorf; Wiebke Sondermann; Cindy Franklin; Felix Bestvater; Michael J Flaig; Dieter Krahl; Christof von Kalle; Stefan Fröhling; Titus J Brinker

doi:10.1016/j.ejca.2019.06.012

Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images

Eur J Cancer. 2019 Sep:118:91-96. doi: 10.1016/j.ejca.2019.06.012. Epub 2019 Jul 18.

Authors

Affiliations

¹ National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany.
² Department of Dermatology, Heidelberg University, Mannheim, Germany; Skin Cancer Unit, German Cancer Research Center, Heidelberg, Germany.
³ Department of Dermatology, University Hospital Heidelberg, Heidelberg, Germany.
⁴ Institute of Pathology and Neuropathology, Eberhard-Karls-University Tuebingen and National Center for Pleura and Peritoneum, University of Tuebingen, Germany.
⁵ Department of Dermatology, University Hospital Essen, Essen, Germany.
⁶ Department of Dermatology, University Hospital Cologne, Cologne, Germany.
⁷ Core Facility Unit Light Microscopy, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.
⁸ Department of Dermatology, University Hospital Munich (LMU), Munich, Germany.
⁹ Private Laboratory of Dermatohistopathology, Mönchhofstraße 52, 69120 Heidelberg.
¹⁰ National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany; Department of Dermatology, University Hospital Heidelberg, Heidelberg, Germany. Electronic address: titus.brinker@dkfz.de.

PMID: 31325876
DOI: 10.1016/j.ejca.2019.06.012

Abstract

Background: The diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For melanoma, the literature reports on 25-26% of discordance for classifying a benign nevus versus malignant melanoma. A recent study indicated the potential of deep learning to lower these discordances. However, the performance of deep learning in classifying histopathologic melanoma images was never compared directly to human experts. The aim of this study is to perform such a first direct comparison.

Methods: A total of 695 lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi/345 melanoma). Only the haematoxylin & eosin (H&E) slides of these lesions were digitalised via a slide scanner and then randomly cropped. A total of 595 of the resulting images were used to train a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison to 11 histopathologists. Three combined McNemar tests comparing the results of the CNNs test runs in terms of sensitivity, specificity and accuracy were predefined to test for significance (p < 0.05).

Findings: The CNN achieved a mean sensitivity/specificity/accuracy of 76%/60%/68% over 11 test runs. In comparison, the 11 pathologists achieved a mean sensitivity/specificity/accuracy of 51.8%/66.5%/59.2%. Thus, the CNN was significantly (p = 0.016) superior in classifying the cropped images.

Interpretation: With limited image information available, a CNN was able to outperform 11 histopathologists in the classification of histopathological melanoma images and thus shows promise to assist human melanoma diagnoses.

Keywords: Artificial intelligence; Deep learning; Histopathology; Melanoma; Pathology.

Publication types

Comparative Study

MeSH terms

Biopsy
Deep Learning*
Diagnosis, Computer-Assisted*
Diagnosis, Differential
Humans
Image Interpretation, Computer-Assisted*
Melanoma / classification
Melanoma / pathology*
Microscopy*
Nevus / classification
Nevus / pathology*
Observer Variation
Pathologists*
Predictive Value of Tests
Reproducibility of Results
Skin Neoplasms / classification
Skin Neoplasms / pathology*