Fast estimation of genetic correlation for biobank-scale data

Am J Hum Genet. 2022 Jan 6;109(1):24-32. doi: 10.1016/j.ajhg.2021.11.015. Epub 2021 Dec 2.

Abstract

Genetic correlation is an important parameter in efforts to understand the relationships among complex traits. Current methods that analyze individual genotype data for estimating genetic correlation are challenging to scale to large datasets. Methods that analyze summary data, while being computationally efficient, tend to yield estimates of genetic correlation with reduced precision. We propose SCORE (scalable genetic correlation estimator), a randomized method of moments estimator of genetic correlation that is both scalable and accurate. SCORE obtains more precise estimates of genetic correlations relative to summary-statistic methods that can be applied at scale; it achieves a 44% reduction in standard error relative to LD-score regression (LDSC) and a 20% reduction relative to high-definition likelihood (HDL) (averaged over all simulations). The efficiency of SCORE enables computation of genetic correlations on the UK Biobank dataset, consisting of ≈300 K individuals and ≈500 K SNPs, in a few h (orders of magnitude faster than methods that analyze individual data, such as GCTA). Across 780 pairs of traits in 291,273 unrelated white British individuals in the UK Biobank, SCORE identifies significant genetic correlation between 200 additional pairs of traits over LDSC (beyond the 245 pairs identified by both).

Keywords: biobank; complex traits; genetic correlation; method of moments; pleiotropy.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Biological Specimen Banks*
  • Genetic Association Studies*
  • Genetic Background*
  • Genetic Variation
  • Humans
  • Models, Genetic*
  • Multifactorial Inheritance
  • Phenotype*
  • Reproducibility of Results
  • United Kingdom