Identification of Reliable Components in Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS): a Data-Driven Approach across Metabolic Processes

Sci Rep. 2015 Nov 4:5:15710. doi: 10.1038/srep15710.

Abstract

There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as "reliable" or "unreliable" based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance ((1)H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named "cluster-aided MCR-ALS," will facilitate the attainment of more reliable results in the metabolomics datasets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Biomarkers / analysis
  • Cluster Analysis
  • Data Interpretation, Statistical
  • Discriminant Analysis
  • Feces / chemistry*
  • Least-Squares Analysis*
  • Metabolomics / methods
  • Metabolomics / statistics & numerical data
  • Mice
  • Mice, Inbred C3H
  • Mice, Inbred C57BL
  • Mice, Inbred DBA
  • Multivariate Analysis*
  • Principal Component Analysis / methods*
  • Proton Magnetic Resonance Spectroscopy / methods*
  • Reproducibility of Results
  • Urine / chemistry*

Substances

  • Biomarkers