Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task

Titus J Brinker; Achim Hekler; Alexander H Enk; Joachim Klode; Axel Hauschild; Carola Berking; Bastian Schilling; Sebastian Haferkamp; Dirk Schadendorf; Tim Holland-Letz; Jochen S Utikal; Christof von Kalle; Collaborators

doi:10.1016/j.ejca.2019.04.001

Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task

Eur J Cancer. 2019 May:113:47-54. doi: 10.1016/j.ejca.2019.04.001. Epub 2019 Apr 10.

Authors

Collaborators

Collaborators:
Wiebke Ludwig-Peitsch, Judith Sirokay, Lucie Heinzerling, Magarete Albrecht, Katharina Baratella, Lena Bischof, Eleftheria Chorti, Anna Dith, Christina Drusio, Nina Giese, Emmanouil Gratsias, Klaus Griewank, Sandra Hallasch, Zdenka Hanhart, Saskia Herz, Katja Hohaus, Philipp Jansen, Finja Jockenhöfer, Theodora Kanaki, Sarah Knispel, Katja Leonhard, Anna Martaki, Liliana Matei, Johanna Matull, Alexandra Olischewski, Maximilian Petri, Jan-Malte Placke, Simon Raub, Katrin Salva, Swantje Schlott, Elsa Sody, Nadine Steingrube, Ingo Stoffels, Selma Ugurel, Anne Zaremba, Christoffer Gebhardt, Nina Booken, Maria Christolouka, Kristina Buder-Bakhaya, Therezia Bokor-Billmann, Alexander Enk, Patrick Gholam, Holger Hänßle, Martin Salzmann, Sarah Schäfer, Knut Schäkel, Timo Schank, Ann-Sophie Bohne, Sophia Deffaa, Katharina Drerup, Friederike Egberts, Anna-Sophie Erkens, Benjamin Ewald, Sandra Falkvoll, Sascha Gerdes, Viola Harde, Axel Hauschild, Marion Jost, Katja Kosova, Laetitia Messinger, Malte Metzner, Kirsten Morrison, Rogina Motamedi, Anja Pinczker, Anne Rosenthal, Natalie Scheller, Thomas Schwarz, Dora Stölzl, Federieke Thielking, Elena Tomaschewski, Ulrike Wehkamp, Michael Weichenthal, Oliver Wiedow, Claudia Maria Bär, Sophia Bender-Säbelkampf, Marc Horbrügger, Ante Karoglan, Luise Kraas, Jörg Faulhaber, Cyrill Geraud, Ze Guo, Philipp Koch, Miriam Linke, Nolwenn Maurier, Verena Müller, Benjamin Thomas, Jochen Sven Utikal, Ali Saeed M Alamri, Andrea Baczako, Carola Berking, Matthias Betke, Carolin Haas, Daniela Hartmann, Markus V Heppt, Katharina Kilian, Sebastian Krammer, Natalie Lidia Lapczynski, Sebastian Mastnik, Suzan Nasifoglu, Cristel Ruini, Elke Sattler, Max Schlaak, Hans Wolff, Birgit Achatz, Astrid Bergbreiter, Konstantin Drexler, Monika Ettinger, Sebastian Haferkamp, Anna Halupczok, Marie Hegemann, Verena Dinauer, Maria Maagk, Marion Mickler, Biance Philipp, Anna Wilm, Constanze Wittmann, Anja Gesierich, Valerie Glutsch, Katrin Kahlert, Andreas Kerstan, Bastian Schilling, Philipp Schrüfer

Affiliations

¹ National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120 Heidelberg, Germany; Department of Dermatology, University Hospital Heidelberg, Heidelberg, Germany. Electronic address: titus.brinker@dkfz.de.
² National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120 Heidelberg, Germany.
³ Department of Dermatology, University Hospital Heidelberg, Heidelberg, Germany.
⁴ Department of Dermatology, University Hospital Essen, Essen, Germany.
⁵ Department of Dermatology, University Hospital Kiel, Kiel, Germany.
⁶ Department of Dermatology, University Hospital Munich (LMU), Munich, Germany.
⁷ Department of Dermatology, University Hospital Würzburg, Würzburg, Germany.
⁸ Department of Dermatology, University Hospital Regensburg, Regensburg, Germany.
⁹ Department of Biostatistics, German Cancer Research Center, Heidelberg, Germany.
¹⁰ Department of Dermatology, Heidelberg University, Mannheim, Germany; Skin Cancer Unit, German Cancer Research Center (DKFZ), Heidelberg, Germany.

PMID: 30981091
DOI: 10.1016/j.ejca.2019.04.001

Abstract

Background: Recent studies have successfully demonstrated the use of deep-learning algorithms for dermatologist-level classification of suspicious lesions by the use of excessive proprietary image databases and limited numbers of dermatologists. For the first time, the performance of a deep-learning algorithm trained by open-source images exclusively is compared to a large number of dermatologists covering all levels within the clinical hierarchy.

Methods: We used methods from enhanced deep learning to train a convolutional neural network (CNN) with 12,378 open-source dermoscopic images. We used 100 images to compare the performance of the CNN to that of the 157 dermatologists from 12 university hospitals in Germany. Outperformance of dermatologists by the deep neural network was measured in terms of sensitivity, specificity and receiver operating characteristics.

Findings: The mean sensitivity and specificity achieved by the dermatologists with dermoscopic images was 74.1% (range 40.0%-100%) and 60% (range 21.3%-91.3%), respectively. At a mean sensitivity of 74.1%, the CNN exhibited a mean specificity of 86.5% (range 70.8%-91.3%). At a mean specificity of 60%, a mean sensitivity of 87.5% (range 80%-95%) was achieved by our algorithm. Among the dermatologists, the chief physicians showed the highest mean specificity of 69.2% at a mean sensitivity of 73.3%. With the same high specificity of 69.2%, the CNN had a mean sensitivity of 84.5%.

Interpretation: A CNN trained by open-source images exclusively outperformed 136 of the 157 dermatologists and all the different levels of experience (from junior to chief physicians) in terms of average specificity and sensitivity.

Keywords: Artificial intelligence; Melanoma; Skin cancer.

MeSH terms

Deep Learning*
Dermatologists*
Dermoscopy*
Germany
Hospitals, University
Humans
Melanoma / diagnosis*
Melanoma / pathology
Nevus / diagnosis*
Nevus / pathology
Sensitivity and Specificity
Skin Neoplasms / diagnosis*
Skin Neoplasms / pathology