Unveiling the risks of ChatGPT in diagnostic surgical pathologyChatGPT

Vincenzo Guastafierro; Devin N Corbitt; Alessandra Bressan; Bethania Fernandes; Ömer Mintemur; Francesca Magnoli; Susanna Ronchi; Stefano La Rosa; Silvia Uccella; Salvatore Lorenzo Renne

doi:10.1007/s00428-024-03918-1

Unveiling the risks of ChatGPT in diagnostic surgical pathologyChatGPT

Virchows Arch. 2024 Sep 13. doi: 10.1007/s00428-024-03918-1. Online ahead of print.

Authors

Vincenzo Guastafierro^{1

2}, Devin N Corbitt¹, Alessandra Bressan^{1

2}, Bethania Fernandes², Ömer Mintemur², Francesca Magnoli³, Susanna Ronchi³, Stefano La Rosa^{3

4}, Silvia Uccella^{1

2}, Salvatore Lorenzo Renne^{5

6}

Affiliations

¹ Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072, Pieve Emanuele, Milan, Italy.
² Department of Pathology, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy.
³ Unit of Pathology, Department of Oncology, ASST Sette Laghi, Varese, Italy.
⁴ Unit of Pathology, Department of Medicine and Technological Innovation, University of Insubria, Varese, Italy.
⁵ Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072, Pieve Emanuele, Milan, Italy. salvatore.renne@hunimed.eu.
⁶ Department of Pathology, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy. salvatore.renne@hunimed.eu.

PMID: 39269615
DOI: 10.1007/s00428-024-03918-1

Abstract

ChatGPT, an AI capable of processing and generating human-like language, has been studied in medical education and care, yet its potential in histopathological diagnosis remains unexplored. This study evaluates ChatGPT's reliability in addressing pathology-related diagnostic questions across ten subspecialties and its ability to provide scientific references. We crafted five clinico-pathological scenarios per subspecialty, simulating a pathologist using ChatGPT to refine differential diagnoses. Each scenario, aligned with current diagnostic guidelines and validated by expert pathologists, was posed as open-ended or multiple-choice questions, either requesting scientific references or not. Outputs were assessed by six pathologists according to. (1) usefulness in supporting the diagnosis and (2) absolute number of errors. We used directed acyclic graphs and structural causal models to determine the effect of each scenario type, field, question modality, and pathologist evaluation. We yielded 894 evaluations. ChatGPT provided useful answers in 62.2% of cases, and 32.1% of outputs contained no errors, while the remaining had at least one error. ChatGPT provided 214 bibliographic references: 70.1% correct, 12.1% inaccurate, and 17.8% non-existing. Scenario variability had the greatest impact on ratings, and latent knowledge across fields showed minimal variation. Although ChatGPT provided useful responses in one-third of cases, the frequency of errors and variability underscores its inadequacy for routine diagnostic use and highlights the need for discretion as a support tool. Imprecise referencing also suggests caution as a self-learning tool. It is essential to recognize the irreplaceable role of human experts in synthesizing images, clinical data, and experience for the intricate task of histopathological diagnosis.

Keywords: Accuracy; ChatGPT; Large language model; Surgical pathology; Usefulness.