Polygenic risk scores (PRSs) have become an increasingly popular approach for demonstrating polygenic influences on complex traits and for establishing common polygenic signals between different traits. PRSs are typically constructed using pruning and thresholding (P+T), but the best choice of parameters is uncertain; thus multiple settings are used and the best is chosen. Optimization can lead to inflated Type I error. Permutation procedures can correct this, but they can be computationally intensive. Alternatively, a single parameter setting can be chosen a priori for the PRS, but choosing suboptimal settings results in loss of power. We propose computing PRSs under a range of parameter settings, performing principal component analysis (PCA) on the resulting set of PRSs, and using the first PRS-PC in association tests. The first PC reweights the variants included in the PRS to achieve maximum variation over all PRS settings used. Using simulations and a real data application to study PRS association with bipolar disorder and psychosis in bipolar disorder, we compare the performance of the proposed PRS-PCA approach with a permutation test and an a priori selected p-value threshold. The PRS-PCA approach is simple to implement, outperforms the other strategies in most scenarios, and provides an unbiased estimate of prediction performance.
Keywords: permutation; polygenic risk scores; principal component analysis; weighting.
© 2020 Wiley Periodicals LLC.