Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox

Genome Biol. 2021 Mar 30;22(1):93. doi: 10.1186/s13059-021-02306-1.

Abstract

The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing issues. To address these, we developed SIAMCAT, a versatile R toolbox for ML-based comparative metagenomics. We demonstrate its capabilities in a meta-analysis of fecal metagenomic studies (10,803 samples). When naively transferred across studies, ML models lost accuracy and disease specificity, which could however be resolved by a novel training set augmentation strategy. This reveals some biomarkers to be disease-specific, with others shared across multiple conditions. SIAMCAT is freely available from siamcat.embl.de .

Keywords: Machine learning; Meta-analysis; Microbiome data analysis; Microbiome-wide association studies (MWAS); Statistical modeling.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Confounding Factors, Epidemiologic
  • Crohn Disease / etiology
  • Databases, Genetic
  • Gastrointestinal Microbiome
  • Humans
  • Machine Learning*
  • Meta-Analysis as Topic
  • Metagenome*
  • Metagenomics / methods*
  • Microbiota*
  • Models, Statistical
  • ROC Curve
  • Software*
  • Workflow