Comparative omics-driven genome annotation refinement: application across Yersiniae

Alexandra C Schrimpe-Rutledge; Marcus B Jones; Sadhana Chauhan; Samuel O Purvine; James A Sanford; Matthew E Monroe; Heather M Brewer; Samuel H Payne; Charles Ansong; Bryan C Frank; Richard D Smith; Scott N Peterson; Vladimir L Motin; Joshua N Adkins

doi:10.1371/journal.pone.0033903

Comparative omics-driven genome annotation refinement: application across Yersiniae

PLoS One. 2012;7(3):e33903. doi: 10.1371/journal.pone.0033903. Epub 2012 Mar 27.

Authors

Alexandra C Schrimpe-Rutledge¹, Marcus B Jones, Sadhana Chauhan, Samuel O Purvine, James A Sanford, Matthew E Monroe, Heather M Brewer, Samuel H Payne, Charles Ansong, Bryan C Frank, Richard D Smith, Scott N Peterson, Vladimir L Motin, Joshua N Adkins

Affiliation

¹ Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America.

Abstract

Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. The annotation process is now performed almost exclusively in an automated fashion to balance the large number of sequences generated. One possible way of reducing errors inherent to automated computational annotations is to apply data from omics measurements (i.e. transcriptional and proteomic) to the un-annotated genome with a proteogenomic-based approach. Here, the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species. Transcriptomic and proteomic data derived from highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis Pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 incorrect (i.e., observed frameshifts, extended start sites, and translated pseudogenes) protein-coding sequences within the three current genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus the discovery of many translated pseudogenes, including the insertion-ablated argD, underscores a need for functional analyses to investigate hypotheses related to divergence. Refinements included the discovery of a seemingly essential ribosomal protein, several virulence-associated factors, a transcriptional regulator, and many hypothetical proteins that were missed during annotation.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Amino Acid Sequence
Base Sequence
Computational Biology / methods
Frameshift Mutation
Genome, Bacterial*
Genomics*
Molecular Sequence Annotation*
Molecular Sequence Data
Open Reading Frames
Peptides / chemistry
Proteomics*
Pseudogenes
Sequence Alignment
Transcription Initiation Site
Yersinia / genetics*
Yersinia / metabolism*

Comparative omics-driven genome annotation refinement: application across Yersiniae

Authors

Affiliation

Abstract

Publication types

MeSH terms

Substances

Associated data

Grants and funding