A General Propensity Score for Signal Identification Using Tree-Based Scan Statistics

Shirley V Wang; Judith C Maro; Joshua J Gagne; Elisabetta Patorno; Sushama Kattinakere; Danijela Stojanovic; Efe Eworuke; Elande Baro; Rita Ouellet-Hellstrom; Michael Nguyen; Yong Ma; Inna Dashevsky; David Cole; Sandra DeLuccia; Aaron Hansbury; Ella Pestine; Martin Kulldorff

doi:10.1093/aje/kwab034

A General Propensity Score for Signal Identification Using Tree-Based Scan Statistics

Am J Epidemiol. 2021 Jul 1;190(7):1424-1433. doi: 10.1093/aje/kwab034.

Authors

Shirley V Wang, Judith C Maro, Joshua J Gagne, Elisabetta Patorno, Sushama Kattinakere, Danijela Stojanovic, Efe Eworuke, Elande Baro, Rita Ouellet-Hellstrom, Michael Nguyen, Yong Ma, Inna Dashevsky, David Cole, Sandra DeLuccia, Aaron Hansbury, Ella Pestine, Martin Kulldorff

PMID: 33615330
DOI: 10.1093/aje/kwab034

Abstract

The tree-based scan statistic (TreeScan; Martin Kulldorff, Harvard Medical School, Boston, Massachusetts) is a data-mining method that adjusts for multiple testing of correlated hypotheses when screening thousands of potential adverse events for signal identification. Simulation has demonstrated the promise of TreeScan with a propensity score (PS)-matched cohort design. However, it is unclear which variables to include in a PS for applied signal identification studies to simultaneously adjust for confounding across potential outcomes. We selected 4 pairs of medications with well-understood safety profiles. For each pair, we evaluated 5 candidate PSs with different combinations of 1) predefined general covariates (comorbidity, frailty, utilization), 2) empirically selected (data-driven) covariates, and 3) covariates tailored to the drug pair. For each pair, statistical alerting patterns were similar with alternative PSs (≤11 alerts in 7,996 outcomes scanned). Inclusion of covariates tailored to exposure did not appreciably affect screening results. Inclusion of empirically selected covariates can provide better proxy coverage for confounders but can also decrease statistical power. Unlike tailored covariates, empirical and predefined general covariates can be applied "out of the box" for signal identification. The choice of PS depends on the level of concern about residual confounding versus loss of power. Potential signals should be followed by pharmacoepidemiologic assessment where confounding control is tailored to the specific outcome(s) under investigation.

Keywords: TreeScan; propensity score; real-world data; signal identification.

Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health 2021. This work is written by (a) US Government employee(s) and is in the public domain in the US.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Cohort Studies
Data Interpretation, Statistical*
Data Mining / methods*
Drug Evaluation / statistics & numerical data*
Humans
Pharmacoepidemiology / methods*
Propensity Score*

Grants and funding

HHSF22301003T/FD/FDA HHS/United States