KOMB: K-core based de novo characterization of copy number variation in microbiomes

Comput Struct Biotechnol J. 2022 Jun 17:20:3208-3222. doi: 10.1016/j.csbj.2022.06.019. eCollection 2022.

Abstract

Characterizing metagenomes via kmer-based, database-dependent taxonomic classification has yielded key insights into underlying microbiome dynamics. However, novel approaches are needed to track community dynamics and genomic flux within metagenomes, particularly in response to perturbations. We describe KOMB, a novel method for tracking genome level dynamics within microbiomes. KOMB utilizes K-core decomposition to identify Structural variations (SVs), specifically, population-level Copy Number Variation (CNV) within microbiomes. K-core decomposition partitions the graph into shells containing nodes of induced degree at least K, yielding reduced computational complexity compared to prior approaches. Through validation on a synthetic community, we show that KOMB recovers and profiles repetitive genomic regions in the sample. KOMB is shown to identify functionally-important regions in Human Microbiome Project datasets, and was used to analyze longitudinal data and identify keystone taxa in Fecal Microbiota Transplantation (FMT) samples. In summary, KOMB represents a novel graph-based, taxonomy-oblivious, and reference-free approach for tracking CNV within microbiomes. KOMB is open source and available for download at https://gitlab.com/treangenlab/komb.

Keywords: CDI, Clostridium Difficile Infection; CNV, Copy Number Variation; Copy number variation (CNV); DBG, De Bruijn Graphs; De Bruijn graph; ENA, European Nucleotide Archive; FMT, Fecal Matter Transplantation; FPR, False Positive Rate; Functional characterization; GO, Gene Ontology; GPL, (GNU) General Public License; Graph-based analysis; K-core decomposition; MAGs, Metagenome assembled genomes; Metagenome; ROC, Receiver Operating Curve; Repeats; SAM, Sequence Alignment Map; SRA, Sequence Read Archive; SVs, Structural Variants; TPR, True Positive Rate; Unitigs.