DoEstRare: A statistical test to identify local enrichments in rare genomic variants associated with disease

PLoS One. 2017 Jul 24;12(7):e0179364. doi: 10.1371/journal.pone.0179364. eCollection 2017.

Abstract

Next-generation sequencing technologies made it possible to assay the effect of rare variants on complex diseases. As an extension of the "common disease-common variant" paradigm, rare variant studies are necessary to get a more complete insight into the genetic architecture of human traits. Association studies of these rare variations show new challenges in terms of statistical analysis. Due to their low frequency, rare variants must be tested by groups. This approach is then hindered by the fact that an unknown proportion of the variants could be neutral. The risk level of a rare variation may be determined by its impact but also by its position in the protein sequence. More generally, the molecular mechanisms underlying the disease architecture may involve specific protein domains or inter-genic regulatory regions. While a large variety of methods are optimizing functionality weights for each single marker, few evaluate variant position differences between cases and controls. Here, we propose a test called DoEstRare, which aims to simultaneously detect clusters of disease risk variants and global allele frequency differences in genomic regions. This test estimates, for cases and controls, variant position densities in the genetic region by a kernel method, weighted by a function of allele frequencies. We compared DoEstRare with previously published strategies through simulation studies as well as re-analysis of real datasets. Based on simulation under various scenarios, DoEstRare was the sole to consistently show highest performance, in terms of type I error and power both when variants were clustered or not. DoEstRare was also applied to Brugada syndrome and early-onset Alzheimer's disease data and provided complementary results to other existing tests. DoEstRare, by integrating variant position information, gives new opportunities to explain disease susceptibility. DoEstRare is implemented in a user-friendly R package.

MeSH terms

  • Alzheimer Disease / genetics
  • Biostatistics
  • Brugada Syndrome / genetics
  • Case-Control Studies
  • Computer Simulation
  • Gene Frequency
  • Genetic Association Studies / statistics & numerical data*
  • Genetic Predisposition to Disease
  • Genetic Variation*
  • Genome-Wide Association Study
  • Genomics
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Models, Genetic

Grants and funding

This work was supported by a grant from The French Regional Council of Pays de la Loire (RFI VaCaRMe: Recherche, Formation et Innovation, Vaincre les maladies Cardiovaculaires, Respiratoires et Métaboliques). The website for this project is available at the following URL address http://www.vacarme-project.org/.