Using pseudoalignment and base quality to accurately quantify microbial community composition

Mark Reppell; John Novembre

doi:10.1371/journal.pcbi.1006096

Using pseudoalignment and base quality to accurately quantify microbial community composition

PLoS Comput Biol. 2018 Apr 16;14(4):e1006096. doi: 10.1371/journal.pcbi.1006096. eCollection 2018 Apr.

Authors

Mark Reppell¹, John Novembre¹

Affiliation

¹ Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America.

Abstract

Pooled DNA from multiple unknown organisms arises in a variety of contexts, for example microbial samples from ecological or human health research. Determining the composition of pooled samples can be difficult, especially at the scale of modern sequencing data and reference databases. Here we propose a novel method for taxonomic profiling in pooled DNA that combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. We apply the method to the problem of classifying 16S rRNA reads using a reference database of known organisms, a common challenge in microbiome research. Using simulations, we show the method is accurate across a variety of read lengths, with different length reference sequences, at different sample depths, and when samples contain reads originating from organisms absent from the reference. We also assess performance in real 16S data, where we reanalyze previous genetic association data to show our method discovers a larger number of quantitative trait associations than other widely used methods. We implement our method in the software Karp, for k-mer based analysis of read pools, to provide a novel combination of speed and accuracy that is uniquely suited for enhancing discoveries in microbial studies.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Base Composition
Computational Biology
Computer Simulation
DNA / chemistry
DNA / genetics
Databases, Genetic
Humans
Microbial Consortia / genetics*
Microbiota / genetics
Quantitative Trait Loci
RNA, Ribosomal, 16S / genetics*
Sequence Alignment / statistics & numerical data
Software

Substances

RNA, Ribosomal, 16S
DNA

Abstract

Publication types

MeSH terms

Substances

Grants and funding