Pretreating and normalizing metabolomics data for statistical analysis

Genes Dis. 2023 Jul 7;11(3):100979. doi: 10.1016/j.gendis.2023.04.018. eCollection 2024 May.

Abstract

Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples. Metabolomics is emerging as a powerful tool generally for precision medicine. Particularly, integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease. However, metabolomics data are very complicated. Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis. In this review article, we comprehensively review various methods that are used to preprocess and pretreat metabolomics data, including MS-based data and NMR -based data preprocessing, dealing with zero and/or missing values and detecting outliers, data normalization, data centering and scaling, data transformation. We discuss the advantages and limitations of each method. The choice for a suitable preprocessing method is determined by the biological hypothesis, the characteristics of the data set, and the selected statistical data analysis method. We then provide the perspective of their applications in the microbiome and metabolome research.

Keywords: Data centering and scaling; Data normalization; Data transformation; MS-Based data preprocessing; Missing values; NMR Data preprocessing; Outliers; Preprocessing/pretreatment.

Publication types

  • Review