Mendelian randomization analysis with multiple genetic variants using summarized data

Stephen Burgess; Adam Butterworth; Simon G Thompson

doi:10.1002/gepi.21758

Mendelian randomization analysis with multiple genetic variants using summarized data

Genet Epidemiol. 2013 Nov;37(7):658-65. doi: 10.1002/gepi.21758. Epub 2013 Sep 20.

Authors

Stephen Burgess¹, Adam Butterworth, Simon G Thompson

Affiliation

¹ Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom.

Abstract

Genome-wide association studies, which typically report regression coefficients summarizing the associations of many genetic variants with various traits, are potentially a powerful source of data for Mendelian randomization investigations. We demonstrate how such coefficients from multiple variants can be combined in a Mendelian randomization analysis to estimate the causal effect of a risk factor on an outcome. The bias and efficiency of estimates based on summarized data are compared to those based on individual-level data in simulation studies. We investigate the impact of gene-gene interactions, linkage disequilibrium, and 'weak instruments' on these estimates. Both an inverse-variance weighted average of variant-specific associations and a likelihood-based approach for summarized data give similar estimates and precision to the two-stage least squares method for individual-level data, even when there are gene-gene interactions. However, these summarized data methods overstate precision when variants are in linkage disequilibrium. If the P-value in a linear regression of the risk factor for each variant is less than 1×10⁻⁵, then weak instrument bias will be small. We use these methods to estimate the causal association of low-density lipoprotein cholesterol (LDL-C) on coronary artery disease using published data on five genetic variants. A 30% reduction in LDL-C is estimated to reduce coronary artery disease risk by 67% (95% CI: 54% to 76%). We conclude that Mendelian randomization investigations using summarized data from uncorrelated variants are similarly efficient to those using individual-level data, although the necessary assumptions cannot be so fully assessed.

Keywords: Mendelian randomization; causal inference; genome-wide association study; instrumental variables; weak instruments.

MeSH terms

Bias
Cholesterol, LDL / biosynthesis
Cholesterol, LDL / genetics
Cholesterol, LDL / metabolism
Coronary Disease / genetics
Coronary Disease / metabolism
Coronary Disease / physiopathology
Genes / genetics
Genetic Variation / genetics*
Genome-Wide Association Study
Humans
Least-Squares Analysis
Likelihood Functions
Linear Models
Linkage Disequilibrium / genetics
Mendelian Randomization Analysis / methods*
Models, Genetic
Odds Ratio
Phenotype
Risk Factors

Substances

Cholesterol, LDL

Abstract

MeSH terms

Substances

Grants and funding