Natural language processing to identify and characterize spondyloarthritis in clinical practice

RMD Open. 2024 May 24;10(2):e004302. doi: 10.1136/rmdopen-2024-004302.

Abstract

Objective: This study aims to use a novel technology based on natural language processing (NLP) to extract clinical information from electronic health records (EHRs) to characterise the clinical profile of patients diagnosed with spondyloarthritis (SpA) at a large-scale hospital.

Methods: An observational, retrospective analysis was conducted on EHR data from all patients with SpA (including psoriatic arthritis (PsA)) at Hospital Universitario La Paz, between 2020 and 2022. Data were collected using Savana Manager, an NLP-based system, enabling the extraction of information from unstructured, free-text EHRs. Variables analysed included demographic data, SpA subtypes, comorbidities and treatments. The performance of the technology in detecting SpA clinical entities was evaluated through precision, recall and F-1 score metrics.

Results: From a hospital population of 639 474 patients, 4337 (0.7%) patients had a diagnosis of SpA or their subtypes in their EHR. The population predominantly comprised men (55.3%) with a mean age of 50.9 years. Peripheral SpA (including PsA) was reported in 31.6%, axial SpA in 20.9%, both axial and peripheral SpA in 3.7%, while 43.7% of patients did not have the SpA subtype reported. Common comorbidities included hypertension (25.0%), dyslipidaemia (22.2%) and diabetes mellitus (15.5%). The use of conventional disease-modifying antirheumatic drugs (csDMARDs) and biological DMARDs (bDMARDs) was documented, with methotrexate (25.3% of patients) being the most used csDMARDs and adalimumab (10.6% of patients) the most used bDMARD. The NLP technology demonstrated high precision and recall, with all the assessed F-1 score values over 0.80, indicating reliable data extraction.

Conclusion: The application of NLP technology facilitated the characterisation of the SpA patient profile, including demographics, clinical features, comorbidities and treatments. This study supports the utility of NLP in enhancing the understanding of SpA and suggests its potential for improving patient management by extracting meaningful information from unstructured EHR data.

Keywords: Machine Learning; Outcome Assessment, Health Care; Spondyloarthritis.

Publication types

  • Observational Study

MeSH terms

  • Adult
  • Antirheumatic Agents / therapeutic use
  • Arthritis, Psoriatic / diagnosis
  • Arthritis, Psoriatic / epidemiology
  • Comorbidity
  • Electronic Health Records*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Natural Language Processing*
  • Retrospective Studies
  • Spondylarthritis* / diagnosis
  • Spondylarthritis* / drug therapy
  • Spondylarthritis* / epidemiology

Substances

  • Antirheumatic Agents