Multiple-testing correction in metabolome-wide association studies

BMC Bioinformatics. 2021 Feb 12;22(1):67. doi: 10.1186/s12859-021-03975-2.

Abstract

Background: The search for statistically significant relationships between molecular markers and outcomes is challenging when dealing with high-dimensional, noisy and collinear multivariate omics data, such as metabolomic profiles. Permutation procedures allow for the estimation of adjusted significance levels without assuming independence among metabolomic variables. Nevertheless, the complex non-normal structure of metabolic profiles and outcomes may bias the permutation results leading to overly conservative threshold estimates i.e. lower than those from a Bonferroni or Sidak correction.

Methods: Within a univariate permutation procedure we employ parametric simulation methods based on the multivariate (log-)Normal distribution to obtain adjusted significance levels which are consistent across different outcomes while effectively controlling the type I error rate. Next, we derive an alternative closed-form expression for the estimation of the number of non-redundant metabolic variates based on the spectral decomposition of their correlation matrix. The performance of the method is tested for different model parametrizations and across a wide range of correlation levels of the variates using synthetic and real data sets.

Results: Both the permutation-based formulation and the more practical closed form expression are found to give an effective indication of the number of independent metabolic effects exhibited by the system, while guaranteeing that the derived adjusted threshold is stable across outcome measures with diverse properties.

Keywords: Correlated tests; FWER; MWAS; MWSL; Multiple testing; Permutation.

MeSH terms

  • Genetic Markers / genetics
  • Metabolome*
  • Metabolomics* / methods
  • Models, Biological*
  • Statistical Distributions

Substances

  • Genetic Markers