MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets

PLoS One. 2016 Sep 29;11(9):e0163111. doi: 10.1371/journal.pone.0163111. eCollection 2016.

Abstract

Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder.

Grants and funding

This work was supported by the Center for Genomic Epidemiology (http://www.genomicepidemiology.org/) at the Technical University of Denmark and funded by grant 09-067103/DSF from the Danish Council for Strategic Research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.