We describe PGTools, an open source software suite for analysis and visualization of proteogenomic data. PGTools comprises applications, libraries, customized databases, and visualization tools for analysis of mass-spectrometry data using combined proteomic and genomic backgrounds. A single command is sufficient to search databases, calculate false discovery rates, group and annotate proteins, generate peptide databases from RNA-Seq transcripts, identify altered proteins associated with cancer, and visualize genome scale peptide data sets using sophisticated visualization tools. We experimentally confirm a subset of proteogenomic peptides in human PANC-1 cells and demonstrate the utility of PGTools using a colorectal cancer data set that led to the identification of 203 novel protein coding regions missed by conventional proteomic approaches. PGTools should be equally useful for individual proteogenomic investigations as well as international initiatives such as chromosome-centric Human Proteome Project (C-HPP). PGTools is available at http://qcmg.org/bioinformatics/PGTools.
Keywords: RNA-seq; automated workflows; cancer; mass spectrometry; mutated peptides; noncoding RNA; proteogenomics; pseudogenes.