Extraction of clinical data on major pulmonary diseases from unstructured radiologic reports using a large language model

Hyung Jun Park; Jin-Young Huh; Ganghee Chae; Myeong Geun Choi

doi:10.1371/journal.pone.0314136

Extraction of clinical data on major pulmonary diseases from unstructured radiologic reports using a large language model

PLoS One. 2024 Nov 25;19(11):e0314136. doi: 10.1371/journal.pone.0314136. eCollection 2024.

Authors

Hyung Jun Park¹, Jin-Young Huh², Ganghee Chae³, Myeong Geun Choi⁴

Affiliations

¹ Department of Internal Medicine, Division of Pulmonary and Critical Care Medicine, Shihwa Medical Center, Siheung, Korea.
² Department of Internal Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Chung-Ang University Gwangmyeong Hospital, Gwangmyeong, Korea.
³ Department of Internal Medicine, Division of Pulmonary and Critical Care Medicine, Ulsan University Hospital, University of Ulsan College of Medicine, Ulsan, Korea.
⁴ Department of Internal Medicine, Division of Pulmonary and Critical Care Medicine, Mokdong Hospital, College of Medicine, Ewha Womans University, Seoul, Korea.

Abstract

Despite significant strides in big data technology, extracting information from unstructured clinical data remains a formidable challenge. This study investigated the utility of large language models (LLMs) for extracting clinical data from unstructured radiological reports without additional training. In this retrospective study, 1800 radiologic reports, 600 from each of the three university hospitals, were collected, with seven pulmonary outcomes defined. Three pulmonology-trained specialists discerned the presence or absence of diseases. Data extraction from the reports was executed using Google Gemini Pro 1.0, OpenAI's GPT-3.5, and GPT-4. The gold standard was predicated on agreement between at least two pulmonologists. This study evaluated the performance of the three LLMs in diagnosing seven pulmonary diseases (active tuberculosis, emphysema, interstitial lung disease, lung cancer, pleural effusion, pneumonia, and pulmonary edema) utilizing chest radiography and computed tomography scans. All models exhibited high accuracy (0.85-1.00) for most conditions. GPT-4 consistently outperformed its counterparts, demonstrating a sensitivity of 0.71-1.00; specificity of 0.89-1.00; and accuracy of 0.89 and 0.99 across both modalities, thus underscoring its superior capability in interpreting radiological reports. Notably, the accuracy of pleural effusion and emphysema on chest radiographs and pulmonary edema on chest computed tomography scans reached 0.99. The proficiency of LLMs, particularly GPT-4, in accurately classifying unstructured radiological data hints at their potential as alternatives to the traditional manual chart reviews conducted by clinicians.

Copyright: © 2024 Park et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Female
Humans
Lung Diseases* / diagnostic imaging
Male
Middle Aged
Natural Language Processing
Retrospective Studies
Tomography, X-Ray Computed* / methods

Grants and funding

The author(s) received no specific funding for this work.