Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: An overview of the current literature

Richard Wyss; Chen Yanover; Tal El-Hay; Dimitri Bennett; Robert W Platt; Andrew R Zullo; Grammati Sari; Xuerong Wen; Yizhou Ye; Hongbo Yuan; Mugdha Gokhale; Elisabetta Patorno; Kueiyu Joshua Lin

doi:10.1002/pds.5500

Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: An overview of the current literature

Pharmacoepidemiol Drug Saf. 2022 Sep;31(9):932-943. doi: 10.1002/pds.5500. Epub 2022 Jul 5.

Authors

Richard Wyss¹, Chen Yanover², Tal El-Hay^{2

3}, Dimitri Bennett⁴, Robert W Platt⁵, Andrew R Zullo⁶, Grammati Sari⁷, Xuerong Wen⁸, Yizhou Ye⁹, Hongbo Yuan¹⁰, Mugdha Gokhale¹¹, Elisabetta Patorno¹, Kueiyu Joshua Lin^{1

12}

Affiliations

¹ Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
² KI Research Institute, Kfar Malal, Israel.
³ IBM Research-Haifa Labs, Haifa, Israel.
⁴ Global Evidence and Outcomes, Takeda Pharmaceutical Company Ltd., Cambridge, Massachusetts, USA.
⁵ Department of Epidemiology, Biostatistics, and Occupational Health, McGIl University, Montreal, Canada.
⁶ Department of Health Services, Policy, and Practice, Brown University School of Public Health and Center of Innovation in Long-Term Services and Supports, Providence Veterans Affairs Medical Center, Providence, Rhode Island, USA.
⁷ Real World Evidence Strategy Lead, Visible Analytics Ltd, Oxford, UK.
⁸ Health Outcomes, Pharmacy Practice, College of Pharmacy, University of Rhode Island, Kingston, Rhode Island, USA.
⁹ Global Epidemiology, AbbVie Inc., Illinois, USA.
¹⁰ Canadian Agency for Drugs and Technologies in Health, Ottawa, Canada.
¹¹ Pharmacoepidemiology, Center for Observational and Real-world Evidence, Merck, Pennsylvania, USA.
¹² Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA.

Abstract

Purpose: Supplementing investigator-specified variables with large numbers of empirically identified features that collectively serve as 'proxies' for unspecified or unmeasured factors can often improve confounding control in studies utilizing administrative healthcare databases. Consequently, there has been a recent focus on the development of data-driven methods for high-dimensional proxy confounder adjustment in pharmacoepidemiologic research. In this paper, we survey current approaches and recent advancements for high-dimensional proxy confounder adjustment in healthcare database studies.

Methods: We discuss considerations underpinning three areas for high-dimensional proxy confounder adjustment: (1) feature generation-transforming raw data into covariates (or features) to be used for proxy adjustment; (2) covariate prioritization, selection, and adjustment; and (3) diagnostic assessment. We discuss challenges and avenues of future development within each area.

Results: There is a large literature on methods for high-dimensional confounder prioritization/selection, but relatively little has been written on best practices for feature generation and diagnostic assessment. Consequently, these areas have particular limitations and challenges.

Conclusions: There is a growing body of evidence showing that machine-learning algorithms for high-dimensional proxy-confounder adjustment can supplement investigator-specified variables to improve confounding control compared to adjustment based on investigator-specified variables alone. However, more research is needed on best practices for feature generation and diagnostic assessment when applying methods for high-dimensional proxy confounder adjustment in pharmacoepidemiologic studies.

Keywords: causal inference; confounding; machine learning.

Publication types

Review
Research Support, Non-U.S. Gov't

MeSH terms

Confounding Factors, Epidemiologic
Databases, Factual
Delivery of Health Care
Humans
Machine Learning*
Pharmacoepidemiology*

Grants and funding

R01 AG065722/AG/NIA NIH HHS/United States