The scale-free nature of protein sequence space

PLoS One. 2018 Aug 1;13(8):e0200815. doi: 10.1371/journal.pone.0200815. eCollection 2018.

Abstract

The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, their network properties were similar. The fractal network dimension Df was distance-dependent: a high dimension for single and double mutants (Df = 4.0), which dropped to Df = 0.7-1.0 at 90% sequence identity, and increased to Df = 3.5-4.5 below 70% sequence identity. The distance dependency of the network dimension is consistent with evolutionary constraints for functional proteins. While random single and double mutations often result in a functional protein, the accumulation of more than ten mutations is dominated by epistasis. The networks of the five protein families were highly inhomogeneous with few highly connected communities ("hub sequences") and a large number of smaller and less connected communities. The degree distributions followed a power-law distribution with similar scaling exponents close to 1. Because the hub sequences have a large number of functional neighbors, they are expected to be robust toward possible deleterious effects of mutations. Because of their robustness, hub sequences have the potential of high innovability, with additional mutations readily inducing new functions. Therefore, they form hotspots of evolution and are promising candidates as starting points for directed evolution experiments in biotechnology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Fractals
  • Models, Molecular*
  • Mutation
  • Protein Domains
  • Protein Folding
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / metabolism
  • Sequence Alignment

Substances

  • Proteins

Grants and funding

This work was funded by the Deutsche Forschungsgemeinschaft (FOR 1296 (JP), EXC 310 (CZ), PL145/16-1 (PCFB). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.