Performance of Gut Microbiome as an Independent Diagnostic Tool for 20 Diseases: Cross-Cohort Validation of Machine-Learning Classifiers

Gut Microbes. 2023 Jan-Dec;15(1):2205386. doi: 10.1080/19490976.2023.2205386.

Abstract

Cross-cohort validation is essential for gut-microbiome-based disease stratification but was only performed for limited diseases. Here, we systematically evaluated the cross-cohort performance of gut microbiome-based machine-learning classifiers for 20 diseases. Using single-cohort classifiers, we obtained high predictive accuracies in intra-cohort validation (~0.77 AUC), but low accuracies in cross-cohort validation, except the intestinal diseases (~0.73 AUC). We then built combined-cohort classifiers trained on samples combined from multiple cohorts to improve the validation of non-intestinal diseases, and estimated the required sample size to achieve validation accuracies of >0.7. In addition, we observed higher validation performance for classifiers using metagenomic data than 16S amplicon data in intestinal diseases. We further quantified the cross-cohort marker consistency using a Marker Similarity Index and observed similar trends. Together, our results supported the gut microbiome as an independent diagnostic tool for intestinal diseases and revealed strategies to improve cross-cohort performance based on identified determinants of consistent cross-cohort gut microbiome alterations.

Keywords: Gut metagenome; autoimmune disease; cross-cohort validation; intestinal disease; liver disease; machine learning; mental disease; metabolic disease; nervous system diseases; patient stratification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gastrointestinal Microbiome* / genetics
  • Humans
  • Machine Learning
  • Metagenome
  • Metagenomics / methods
  • Research Design

Grants and funding

This research is supported by National Key Research and Development Program of China (2019YFA0905600 to W.H.C, 2020YFA0712403 to X.M.Z), National Natural Science Foundation of China (32070660 to W.H.C; T2225015, 61932008 to X.M.Z), NNSF-VR Sino-Swedish Joint Research Programme (82161138017), Greater Bay Area Institute of Precision Medicine (Guangzhou) (Grant No. IPM21C008), and Shanghai Municipal Science and Technology Major Project (No.2018SHZDZX01), Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (LCNBI) and ZJLab.