Improved polygenic prediction by Bayesian multiple regression on summary statistics

Nat Commun. 2019 Nov 8;10(1):5086. doi: 10.1038/s41467-019-12653-0.

Abstract

Accurate prediction of an individual's phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adipose Tissue
  • Alopecia / genetics
  • Basal Metabolism / genetics
  • Bayes Theorem*
  • Biological Specimen Banks
  • Birth Weight / genetics
  • Body Composition / genetics
  • Body Height / genetics
  • Body Mass Index
  • Bone Density / genetics
  • Diabetes Mellitus, Type 2 / genetics
  • Forced Expiratory Volume / genetics
  • Genetic Association Studies
  • Genome-Wide Association Study
  • Humans
  • Multifactorial Inheritance / genetics*
  • Polymorphism, Single Nucleotide
  • Regression Analysis*
  • Statistics as Topic
  • Vital Capacity / genetics
  • Waist-Hip Ratio