The non-random clustering of non-synonymous substitutions and its relationship to evolutionary rate

BMC Genomics. 2011 Aug 16:12:415. doi: 10.1186/1471-2164-12-415.

Abstract

Background: Protein sequences are subject to a mosaic of constraint. Changes to functional domains and buried residues, for example, are more apt to disrupt protein structure and function than are changes to residues participating in loops or exposed to solvent. Regions of constraint on the tertiary structure of a protein often result in loose segmentation of its primary structure into stretches of slowly- and rapidly-evolving amino acids. This clustering can be exploited, and existing methods have done so by relying on local sequence conservation as a signature of selection to help identify functionally important regions within proteins. We invert this paradigm by leveraging the regional nature of protein structure and function to both illuminate and make use of genome-wide patterns of local sequence conservation.

Results: Our hypothesis is that the regional nature of structural and functional constraints will assert a positive autocorrelation on the evolutionary rates of neighboring sites, which, in a pairwise comparison of orthologous proteins, will manifest itself as the clustering of non-synonymous changes across the amino acid sequence. We introduce a dispersion ratio statistic to test this and related hypotheses. Using genome-wide interspecific comparisons of orthologous protein pairs, we reveal a strong log-linear relationship between the degree of clustering and the intensity of constraint. We further demonstrate how this relationship varies with the evolutionary distance between the species being compared. We provide some evidence that proteins with a history of positive selection deviate from genome-wide trends.

Conclusions: We find a significant association between the evolutionary rate of a protein and the degree to which non-synonymous changes cluster along its primary sequence. We show that clustering is a non-redundant predictor of evolutionary rate, and we speculate that conflicting signals of clustering and constraint may be indicative of a historical period of relaxed selection.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Amino Acid Sequence*
  • Amino Acid Substitution*
  • Animals
  • Cluster Analysis
  • Evolution, Molecular*
  • Humans
  • Molecular Sequence Data
  • Proteins / chemistry
  • Proteins / genetics*
  • Sequence Alignment

Substances

  • Proteins