A Harmonised Approach to Curating Research-Ready Datasets for Asthma, Chronic Obstructive Pulmonary Disease (COPD) and Interstitial Lung Disease (ILD) in England, Wales and Scotland Using Clinical Practice Research Datalink (CPRD), Secure Anonymised Information Linkage (SAIL) Databank and DataLoch

Clin Epidemiol. 2024 Apr 4:16:235-247. doi: 10.2147/CLEP.S437937. eCollection 2024.

Abstract

Background: Electronic healthcare records (EHRs) are an important resource for health research that can be used to improve patient outcomes in chronic respiratory diseases. However, consistent approaches in the analysis of these datasets are needed for coherent messaging, and when undertaking comparative studies across different populations.

Methods and results: We developed a harmonised curation approach to generate comparable patient cohorts for asthma, chronic obstructive pulmonary disease (COPD) and interstitial lung disease (ILD) using datasets from within Clinical Practice Research Datalink (CPRD; for England), Secure Anonymised Information Linkage (SAIL; for Wales) and DataLoch (for Scotland) by defining commonly derived variables consistently between the datasets. By working in parallel on the curation methodology used for CPRD, SAIL and DataLoch for asthma, COPD and ILD, we were able to highlight key differences in coding and recording between the databases and identify solutions to enable valid comparisons.

Conclusion: Codelists and metadata generated have been made available to help re-create the asthma, COPD and ILD cohorts in CPRD, SAIL and DataLoch for different time periods, and provide a starting point for the curation of respiratory datasets in other EHR databases, expediting further comparable respiratory research.

Keywords: COPD; HER; ILD; asthma; data curation; harmonisation.

Grants and funding

This work is supported by BREATHE-The Health Data Research Hub for Respiratory Health (MC_PC_19004). BREATHE is funded through the UK Research and Innovation Industrial Strategy Challenge Fund with additional support from the Medical Research Council and delivered through Health Data Research UK. Infrastructure support for this research was provided by the NIHR Imperial Biomedical Research Centre (BRC).