A computational tool for the genomic identification of regions of unusual compositional properties and its utilization in the detection of horizontally transferred sequences

Mol Biol Evol. 2006 Oct;23(10):1863-8. doi: 10.1093/molbev/msl053. Epub 2006 Jul 7.

Abstract

Similarity Plot (S-plot) is a Windows-based application for large-scale comparisons and 2-dimensional visualization of compositional similarities between genomic sequences. This application combines 2 approaches widely used in genomics: window analysis of statistical characteristics along genomes and dot-plot visual representation. S-plot is effective in identifying highly similar regions between genomes as well as regions with unusual compositional properties (RUCPs) within a single genome, which may be indicative of horizontal gene transfer or of locus-specific selective forces. We use S-plot to identify regions that may have originated through horizontal gene transfer through a 2-step approach, by first comparing a genomic sequence to itself and, subsequently, comparing it to the genomic sequence of a closely related taxon. Moreover, by comparing these suspect sequences to one another, we can estimate a minimum number of sources for these putative xenologous sequences. We illustrate the uses of S-plot in a comparison involving Escherichia coli K12 and E. coli O157:H7. In O157:H7, we found 145 regions that have most probably originated through horizontal gene transfer. By using S-plot to compare each of these regions with 277 completely sequenced prokaryotic genomes, 1 sequence was found to have similar compositional properties to the Yersinia pseudotuberculosis genome, indicating a transfer from a Yersinia or Yersinia relative. Based upon our analysis of RUCPs in O157:H7, we infer that there were at least 53 sources of horizontally transferred sequences.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • DNA, Bacterial / genetics
  • Escherichia coli K12 / genetics
  • Escherichia coli O157 / genetics
  • Evolution, Molecular*
  • Gene Transfer, Horizontal
  • Genome, Bacterial
  • Genomics / statistics & numerical data*
  • Software*
  • Species Specificity
  • Yersinia pseudotuberculosis / genetics

Substances

  • DNA, Bacterial