Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set

J Hum Genet. 2016 Oct;61(10):861-866. doi: 10.1038/jhg.2016.72. Epub 2016 Jun 16.

Abstract

To assess the statistical significance of associations between variants and traits, genome-wide association studies (GWAS) should employ an appropriate threshold that accounts for the massive burden of multiple testing in the study. Although most studies in the current literature commonly set a genome-wide significance threshold at the level of P=5.0 × 10-8, the adequacy of this value for respective populations has not been fully investigated. To empirically estimate thresholds for different ancestral populations, we conducted GWAS simulations using the 1000 Genomes Phase 3 data set for Africans (AFR), Europeans (EUR), Admixed Americans (AMR), East Asians (EAS) and South Asians (SAS). The estimated empirical genome-wide significance thresholds were Psig=3.24 × 10-8 (AFR), 9.26 × 10-8 (EUR), 1.83 × 10-7 (AMR), 1.61 × 10-7 (EAS) and 9.46 × 10-8 (SAS). We additionally conducted trans-ethnic meta-analyses across all populations (ALL) and all populations except for AFR (ΔAFR), which yielded Psig=3.25 × 10-8 (ALL) and 4.20 × 10-8 (ΔAFR). Our results indicate that the current threshold (P=5.0 × 10-8) is overly stringent for all ancestral populations except for Africans; however, we should employ a more stringent threshold when conducting a meta-analysis, regardless of the presence of African samples.

MeSH terms

  • Computer Simulation
  • Datasets as Topic
  • Ethnicity / genetics
  • Genetics, Population / methods
  • Genome-Wide Association Study* / methods
  • Genomics* / methods
  • Humans
  • Linkage Disequilibrium
  • Meta-Analysis as Topic
  • Models, Genetic