snp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing

Ali Al-Shahib; Anthony Underwood

doi:10.1186/1471-2105-14-326

snp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing

BMC Bioinformatics. 2013 Nov 19:14:326. doi: 10.1186/1471-2105-14-326.

Authors

Ali Al-Shahib, Anthony Underwood

Abstract

Background: A typical bacterial pathogen genome mapping project can identify thousands of single nucleotide polymorphisms (SNP). Interpreting SNP data is complex and it is difficult to conceptualise the data contained within the large flat files that are the typical output from most SNP calling algorithms. One solution to this problem is to construct a database that can be queried using simple commands so that SNP interrogation and output is both easy and comprehensible.

Results: Here we present snp-search, a tool that manages SNP data and allows for manipulation and searching of SNP data. After creation of a SNP database from a VCF file, snp-search can be used to convert the selected SNP data into FASTA sequences, construct phylogenies, look for unique SNPs, and output contextual information about each SNP. The FASTA output from snp-search is particularly useful for the generation of robust phylogenetic trees that are based on SNP differences across the conserved positions in whole genomes. Queries can be designed to answer critical genomic questions such as the association of SNPs with particular phenotypes.

Conclusions: snp-search is a tool that manages SNP data and outputs useful information which can be used to test important biological hypotheses.

MeSH terms

Algorithms
Genome, Bacterial*
High-Throughput Nucleotide Sequencing
Phylogeny
Polymorphism, Single Nucleotide*
Sequence Analysis, DNA
Software*
Streptococcus pyogenes / classification
Streptococcus pyogenes / genetics*