A high-dimensional omnibus test for set-based association analysis

Brief Bioinform. 2024 Jul 25;25(5):bbae456. doi: 10.1093/bib/bbae456.

Abstract

Set-based association analysis is a valuable tool in studying the etiology of complex diseases in genome-wide association studies, as it allows for the joint testing of variants in a region or group. Two common types of single nucleotide polymorphism (SNP)-disease functional models are recognized when evaluating the joint function of a set of SNP: the cumulative weak signal model, in which multiple functional variants with small effects contribute to disease risk, and the dominating strong signal model, in which a few functional variants with large effects contribute to disease risk. However, existing methods have two main limitations that reduce their power. Firstly, they typically only consider one disease-SNP association model, which can result in significant power loss if the model is misspecified. Secondly, they do not account for the high-dimensional nature of SNPs, leading to low power or high false positives. In this study, we propose a solution to these challenges by using a high-dimensional inference procedure that involves simultaneously fitting many SNPs in a regression model. We also propose an omnibus testing procedure that employs a robust and powerful P-value combination method to enhance the power of SNP-set association. Our results from extensive simulation studies and a real data analysis demonstrate that our set-based high-dimensional inference strategy is both flexible and computationally efficient and can substantially improve the power of SNP-set association analysis. Application to a real dataset further demonstrates the utility of the testing strategy.

Keywords: P-value combination; SNP–set association; high-dimensional inference; omnibus test; variable screening.

MeSH terms

  • Algorithms
  • Computer Simulation
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study* / methods
  • Humans
  • Models, Genetic
  • Polymorphism, Single Nucleotide*