Background: Genetic association studies are generating much information, usually in the form of single nucleotide polymorphisms in candidate genes. Analyzing such data is challenging, and raises issues of multiple comparisons and potential false-positive associations. Using data from a case-control study of bladder cancer, we showed how to use hierarchical modeling in genetic epidemiologic studies with multiple markers to control overestimation of effects and potential false-positive associations.
Methods: The data were first analyzed with the conventional approach of estimating each main effect individually. We subsequently employed hierarchical modeling by adding a second stage (prior) model that incorporated information on the potential function of the genes. We used an empirical-Bayes approach, estimating the residual effects of the genes from the data. When the residual effect was set to zero, we instead used a semi-Bayes approach, in which they were pre-specified. We also explored the impact of using different second-stage design matrices. Finally, we used two approaches for assessing gene-environment interactions. The first approach added product terms into the first-stage model. The second approach used three indicators for subjects exposed to gene-only, environment-only, and both genetic and environmental factors.
Results: By pre-specifying the prior second-stage covariates, the estimates were shrunk to the mean of each pathway. The conventional model detected a number of positive associations, which were reduced with the hierarchical model. For example, the odds ratio for myeloperoxidase (G/G, G/A) genotype changed from 3.17 [95% confidence interval (CI), 1.32-7.59] to 1.64 (95% CI, 0.81-3.34). A similar phenomenon was observed for the gene-environment interactions. The odds ratio for the gene-environment interaction between tobacco smoking and N-acetyltransferase 1 fast genotype was 2.74 (95% CI, 0.68-11.0) from the conventional analysis and 1.24 (95% CI, 0.80-1.93) from the hierarchical model.
Conclusion: Adding a second-stage hierarchical modeling can reduce the likelihood of false positive via shrinkage toward the prior mean, improve the risk estimation by increasing the precision, and, therefore, represents an alternative to conventional methods for genetic association studies.