Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions

Isabel Moreno-Indias; Leo Lahti; Miroslava Nedyalkova; Ilze Elbere; Gennady Roshchupkin; Muhamed Adilovic; Onder Aydemir; Burcu Bakir-Gungor; Enrique Carrillo-de Santa Pau; Domenica D'Elia; Mahesh S Desai; Laurent Falquet; Aycan Gundogdu; Karel Hron; Thomas Klammsteiner; Marta B Lopes; Laura Judith Marcos-Zambrano; Cláudia Marques; Michael Mason; Patrick May; Lejla Pašić; Gianvito Pio; Sándor Pongor; Vasilis J Promponas; Piotr Przymus; Julio Saez-Rodriguez; Alexia Sampri; Rajesh Shigdel; Blaz Stres; Ramona Suharoschi; Jaak Truu; Ciprian-Octavian Truică; Baiba Vilne; Dimitrios Vlachakis; Ercument Yilmaz; Georg Zeller; Aldert L Zomer; David Gómez-Cabrero; Marcus J Claesson

doi:10.3389/fmicb.2021.635781

Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions

Front Microbiol. 2021 Feb 22:12:635781. doi: 10.3389/fmicb.2021.635781. eCollection 2021.

Authors

Isabel Moreno-Indias^{1

2}, Leo Lahti³, Miroslava Nedyalkova⁴, Ilze Elbere⁵, Gennady Roshchupkin⁶, Muhamed Adilovic⁷, Onder Aydemir⁸, Burcu Bakir-Gungor⁹, Enrique Carrillo-de Santa Pau¹⁰, Domenica D'Elia¹¹, Mahesh S Desai^{12

13}, Laurent Falquet^{14

15}, Aycan Gundogdu^{16

17}, Karel Hron¹⁸, Thomas Klammsteiner¹⁹, Marta B Lopes^{20

21}, Laura Judith Marcos-Zambrano¹⁰, Cláudia Marques²², Michael Mason²³, Patrick May²⁴, Lejla Pašić²⁵, Gianvito Pio²⁶, Sándor Pongor²⁷, Vasilis J Promponas²⁸, Piotr Przymus²⁹, Julio Saez-Rodriguez³⁰, Alexia Sampri³¹, Rajesh Shigdel³², Blaz Stres^{33

34

35}, Ramona Suharoschi³⁶, Jaak Truu³⁷, Ciprian-Octavian Truică³⁸, Baiba Vilne³⁹, Dimitrios Vlachakis⁴⁰, Ercument Yilmaz⁴¹, Georg Zeller⁴², Aldert L Zomer⁴³, David Gómez-Cabrero⁴⁴, Marcus J Claesson⁴⁵

Affiliations

¹ Instituto de Investigación Biomédica de Málaga (IBIMA), Unidad de Gestión Clìnica de Endocrinologìa y Nutrición, Hospital Clìnico Universitario Virgen de la Victoria, Universidad de Málaga, Málaga, Spain.
² Centro de Investigación Biomeìdica en Red de Fisiopatologtìa de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
³ Department of Computing, University of Turku, Turku, Finland.
⁴ Human Genetics and Disease Mechanisms, Latvian Biomedical Research and Study Centre, Riga, Latvia.
⁵ Latvian Biomedical Research and Study Centre, Riga, Latvia.
⁶ Department of Epidemiology, Erasmus Medical Center, Rotterdam, Netherlands.
⁷ Department of Genetics and Bioengineering, International University of Sarajevo, Sarajevo, Bosnia and Herzegovina.
⁸ Department of Electrical and Electronics Engineering, Karadeniz Technical University, Trabzon, Turkey.
⁹ Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey.
¹⁰ Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain.
¹¹ Department for Biomedical Sciences, Institute for Biomedical Technologies, National Research Council, Bari, Italy.
¹² Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg.
¹³ Odense Research Center for Anaphylaxis, Department of Dermatology and Allergy Center, Odense University Hospital, University of Southern Denmark, Odense, Denmark.
¹⁴ Department of Biology, University of Fribourg, Fribourg, Switzerland.
¹⁵ Swiss Institute of Bioinformatics, Lausanne, Switzerland.
¹⁶ Department of Microbiology and Clinical Microbiology, Faculty of Medicine, Erciyes University, Kayseri, Turkey.
¹⁷ Metagenomics Laboratory, Genome and Stem Cell Center (GenKök), Erciyes University, Kayseri, Turkey.
¹⁸ Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czechia.
¹⁹ Department of Microbiology, University of Innsbruck, Innsbruck, Austria.
²⁰ NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, Caparica, Portugal.
²¹ Centro de Matemática e Aplicações (CMA), FCT, UNL, Caparica, Portugal.
²² CINTESIS, NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal.
²³ Computational Oncology, Sage Bionetworks, Seattle, WA, United States.
²⁴ Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
²⁵ Sarajevo Medical School, University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina.
²⁶ Department of Computer Science, University of Bari Aldo Moro, Bari, Italy.
²⁷ Faculty of Information Tehnology and Bionics, Pázmány University, Budapest, Hungary.
²⁸ Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus.
²⁹ Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruñ, Poland.
³⁰ Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Heidelberg, Germany.
³¹ Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom.
³² Department of Clinical Science, University of Bergen, Bergen, Norway.
³³ Jozef Stefan Institute, Ljubljana, Slovenia.
³⁴ Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia.
³⁵ Faculty of Civil and Geodetic Engineering, University of Ljubljana, Ljubljana, Slovenia.
³⁶ Molecular Nutrition and Proteomics Lab, Faculty of the Food Science and Technology, Institute of Life Sciences, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Cluj-Napoca, Romania.
³⁷ Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia.
³⁸ Department of Computer Science and Engineering, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Bucharest, Romania.
³⁹ Bioinformatics Research Unit, Riga Stradins University, Riga, Latvia.
⁴⁰ Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece.
⁴¹ Department of Computer Technologies, Karadeniz Technical University, Trabzon, Turkey.
⁴² European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany.
⁴³ Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, Netherlands.
⁴⁴ Navarrabiomed, Complejo Hospitalario de Navarra (CHN), IdiSNA, Universidad Pública de Navarra (UPNA), Pamplona, Spain.
⁴⁵ School of Microbiology and APC Microbiome Ireland, University College Cork, Cork, Ireland.

Abstract

The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address the heterogeneity of data, specific statistical properties, and the remarkable variation in microbiome composition across individuals and body sites. This has led to a broad spectrum of statistical and machine learning challenges that range from study design, data processing, and standardization to analysis, modeling, cross-study comparison, prediction, data science ecosystems, and reproducible reporting. Nevertheless, although many statistics and machine learning approaches and tools have been developed, new techniques are needed to deal with emerging applications and the vast heterogeneity of microbiome data. We review and discuss emerging applications of statistical and machine learning techniques in human microbiome studies and introduce the COST Action CA18131 "ML4Microbiome" that brings together microbiome researchers and machine learning experts to address current challenges such as standardization of analysis pipelines for reproducibility of data analysis results, benchmarking, improvement, or development of existing and new tools and ontologies.

Keywords: ML4Microbiome; biomarker identification; machine learning; microbiome; personalized medicine.

Copyright © 2021 Moreno-Indias, Lahti, Nedyalkova, Elbere, Roshchupkin, Adilovic, Aydemir, Bakir-Gungor, Santa Pau, D’Elia, Desai, Falquet, Gundogdu, Hron, Klammsteiner, Lopes, Marcos-Zambrano, Marques, Mason, May, Pašić, Pio, Pongor, Promponas, Przymus, Saez-Rodriguez, Sampri, Shigdel, Stres, Suharoschi, Truu, Truică, Vilne, Vlachakis, Yilmaz, Zeller, Zomer, Gómez-Cabrero and Claesson.