Precision Phenotyping for Curating Research Cohorts of Patients with Post-Acute Sequelae of COVID-19 (PASC) as a Diagnosis of Exclusion

Alaleh Azhir; Jonas Hügel; Jiazi Tian; Jingya Cheng; Ingrid V Bassett; Douglas S Bell; Elmer V Bernstam; Maha R Farhat; Darren W Henderson; Emily S Lau; Michele Morris; Yevgeniy R Semenov; Virginia A Triant; Shyam Visweswaran; Zachary H Strasser; Jeffrey G Klann; Shawn N Murphy; Hossein Estiri

doi:10.1101/2024.04.13.24305771

Precision Phenotyping for Curating Research Cohorts of Patients with Post-Acute Sequelae of COVID-19 (PASC) as a Diagnosis of Exclusion

medRxiv [Preprint]. 2024 Apr 16:2024.04.13.24305771. doi: 10.1101/2024.04.13.24305771.

Authors

Alaleh Azhir, Jonas Hügel, Jiazi Tian, Jingya Cheng, Ingrid V Bassett, Douglas S Bell, Elmer V Bernstam, Maha R Farhat, Darren W Henderson, Emily S Lau, Michele Morris, Yevgeniy R Semenov, Virginia A Triant, Shyam Visweswaran, Zachary H Strasser, Jeffrey G Klann, Shawn N Murphy, Hossein Estiri

Abstract

Scalable identification of patients with the post-acute sequelae of COVID-19 (PASC) is challenging due to a lack of reproducible precision phenotyping algorithms and the suboptimal accuracy, demographic biases, and underestimation of the PASC diagnosis code (ICD-10 U09.9). In a retrospective case-control study, we developed a precision phenotyping algorithm for identifying research cohorts of PASC patients, defined as a diagnosis of exclusion. We used longitudinal electronic health records (EHR) data from over 295 thousand patients from 14 hospitals and 20 community health centers in Massachusetts. The algorithm employs an attention mechanism to exclude sequelae that prior conditions can explain. We performed independent chart reviews to tune and validate our precision phenotyping algorithm. Our PASC phenotyping algorithm improves precision and prevalence estimation and reduces bias in identifying Long COVID patients compared to the U09.9 diagnosis code. Our algorithm identified a PASC research cohort of over 24 thousand patients (compared to about 6 thousand when using the U09.9 diagnosis code), with a 79.9 percent precision (compared to 77.8 percent from the U09.9 diagnosis code). Our estimated prevalence of PASC was 22.8 percent, which is close to the national estimates for the region. We also provide an in-depth analysis outlining the clinical attributes, encompassing identified lingering effects by organ, comorbidity profiles, and temporal differences in the risk of PASC. The PASC phenotyping method presented in this study boasts superior precision, accurately gauges the prevalence of PASC without underestimating it, and exhibits less bias in pinpointing Long COVID patients. The PASC cohort derived from our algorithm will serve as a springboard for delving into Long COVID's genetic, metabolomic, and clinical intricacies, surmounting the constraints of recent PASC cohort studies, which were hampered by their limited size and available outcome data.

Publication types

Preprint