Objective: To evaluate the accuracy of disease codes and free text in identifying upper gastrointestinal bleeding (UGIB) from electronic health-care records (EHRs).
Study design and setting: We conducted a validation study in four European electronic health-care record (EHR) databases such as Integrated Primary Care Information (IPCI), Health Search/CSD Patient Database (HSD), ARS, and Aarhus, in which we identified UGIB cases using free text or disease codes: (1) International Classification of Disease (ICD)-9 (HSD, ARS); (2) ICD-10 (Aarhus); and (3) International Classification of Primary Care (ICPC) (IPCI). From each database, we randomly selected and manually reviewed 200 cases to calculate positive predictive values (PPVs). We employed different case definitions to assess the effect of outcome misclassification on estimation of risk of drug-related UGIB.
Results: PPV was 22% [95% confidence interval (CI): 16, 28] and 21% (95% CI: 16, 28) in IPCI for free text and ICPC codes, respectively. PPV was 91% (95% CI: 86, 95) for ICD-9 codes and 47% (95% CI: 35, 59) for free text in HSD. PPV for ICD-9 codes in ARS was 72% (95% CI: 65, 78) and 77% (95% CI: 69, 83) for ICD-10 codes (Aarhus). More specific definitions did not have significant impact on risk estimation of drug-related UGIB, except for wider CIs.
Conclusions: ICD-9-CM and ICD-10 disease codes have good PPV in identifying UGIB from EHR; less granular terminology (ICPC) may require additional strategies. Use of more specific UGIB definitions affects precision, but not magnitude, of risk estimates.
Keywords: Drug safety; Non-steroidal anti-inflammatory agents; Positive predictive value; Signal detection; Upper gastrointestinal bleeding; Validation study.
Copyright © 2014 Elsevier Inc. All rights reserved.