SimkaMin: fast and resource frugal de novo comparative metagenomics

Bioinformatics. 2020 Feb 15;36(4):1275-1276. doi: 10.1093/bioinformatics/btz685.

Abstract

Motivation: De novo comparative metagenomics is one of the most straightforward ways to analyze large sets of metagenomic data. Latest methods use the fraction of shared k-mers to estimate genomic similarity between read sets. However, those methods, while extremely efficient, are still limited by computational needs for practical usage outside of large computing facilities.

Results: We present SimkaMin, a quick comparative metagenomics tool with low disk and memory footprints, thanks to an efficient data subsampling scheme used to estimate Bray-Curtis and Jaccard dissimilarities. One billion metagenomic reads can be analyzed in <3 min, with tiny memory (1.09 GB) and disk (≈0.3 GB) requirements and without altering the quality of the downstream comparative analyses, making of SimkaMin a tool perfectly tailored for very large-scale metagenomic projects.

Availability and implementation: https://github.com/GATB/simka.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genomics
  • Metagenome
  • Metagenomics*
  • Sequence Analysis, DNA
  • Software*