A high-dimensional omnibus test for set-based association analysis

Haitao Yang; Xin Wang; Zechen Zhang; Fuzhao Chen; Hongyan Cao; Lina Yan; Xia Gao; Hui Dong; Yuehua Cui

doi:10.1093/bib/bbae456

A high-dimensional omnibus test for set-based association analysis

Brief Bioinform. 2024 Jul 25;25(5):bbae456. doi: 10.1093/bib/bbae456.

Authors

Haitao Yang^{1

2

3}, Xin Wang¹, Zechen Zhang^{1

2}, Fuzhao Chen¹, Hongyan Cao⁴, Lina Yan^{1

2}, Xia Gao^{1

2}, Hui Dong⁵, Yuehua Cui⁶

Affiliations

¹ Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China.
² Hebei Key Laboratory of Environment and Human Health, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China.
³ Hebei Key Laboratory of Forensic Medicine, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China.
⁴ Department of Health Statistics, Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, School of Public Health; MOE Key Laboratory of Coal Environmental Pathogenicity and Prevention, Shanxi Medical University, No 56 Xinjian South Rd., Taiyuan, Shanxi 030001, P.R. China.
⁵ Department of Neurology, Second Hospital of Hebei Medical University, 215 West Heping Road, Shijiazhuang, Hebei 050000, P.R. China.
⁶ Department of Statistics and Probability, Michigan State University, 619 Red Cedar Rd., East Lansing, MI 48824, United States.

Abstract

Set-based association analysis is a valuable tool in studying the etiology of complex diseases in genome-wide association studies, as it allows for the joint testing of variants in a region or group. Two common types of single nucleotide polymorphism (SNP)-disease functional models are recognized when evaluating the joint function of a set of SNP: the cumulative weak signal model, in which multiple functional variants with small effects contribute to disease risk, and the dominating strong signal model, in which a few functional variants with large effects contribute to disease risk. However, existing methods have two main limitations that reduce their power. Firstly, they typically only consider one disease-SNP association model, which can result in significant power loss if the model is misspecified. Secondly, they do not account for the high-dimensional nature of SNPs, leading to low power or high false positives. In this study, we propose a solution to these challenges by using a high-dimensional inference procedure that involves simultaneously fitting many SNPs in a regression model. We also propose an omnibus testing procedure that employs a robust and powerful P-value combination method to enhance the power of SNP-set association. Our results from extensive simulation studies and a real data analysis demonstrate that our set-based high-dimensional inference strategy is both flexible and computationally efficient and can substantially improve the power of SNP-set association analysis. Application to a real dataset further demonstrates the utility of the testing strategy.

Keywords: P-value combination; SNP–set association; high-dimensional inference; omnibus test; variable screening.

MeSH terms

Algorithms
Computer Simulation
Genetic Predisposition to Disease
Genome-Wide Association Study* / methods
Humans
Models, Genetic
Polymorphism, Single Nucleotide*

Abstract

MeSH terms

Grants and funding