SimkaMin: fast and resource frugal de novo comparative metagenomics

Gaëtan Benoit; Mahendra Mariadassou; Stéphane Robin; Sophie Schbath; Pierre Peterlongo; Claire Lemaitre

doi:10.1093/bioinformatics/btz685

SimkaMin: fast and resource frugal de novo comparative metagenomics

Bioinformatics. 2020 Feb 15;36(4):1275-1276. doi: 10.1093/bioinformatics/btz685.

Authors

Gaëtan Benoit¹, Mahendra Mariadassou², Stéphane Robin³, Sophie Schbath², Pierre Peterlongo¹, Claire Lemaitre¹

Affiliations

¹ Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France.
² MaIAGE, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France.
³ UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, 75005 Paris, France.

PMID: 31504187
DOI: 10.1093/bioinformatics/btz685

Abstract

Motivation: De novo comparative metagenomics is one of the most straightforward ways to analyze large sets of metagenomic data. Latest methods use the fraction of shared k-mers to estimate genomic similarity between read sets. However, those methods, while extremely efficient, are still limited by computational needs for practical usage outside of large computing facilities.

Results: We present SimkaMin, a quick comparative metagenomics tool with low disk and memory footprints, thanks to an efficient data subsampling scheme used to estimate Bray-Curtis and Jaccard dissimilarities. One billion metagenomic reads can be analyzed in <3 min, with tiny memory (1.09 GB) and disk (≈0.3 GB) requirements and without altering the quality of the downstream comparative analyses, making of SimkaMin a tool perfectly tailored for very large-scale metagenomic projects.

Availability and implementation: https://github.com/GATB/simka.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Genomics
Metagenome
Metagenomics*
Sequence Analysis, DNA
Software*