Structure-based machine-guided mapping of amyloid sequence space reveals uncharted sequence clusters with higher solubilities

Nikolaos Louros; Gabriele Orlando; Matthias De Vleeschouwer; Frederic Rousseau; Joost Schymkowitz

doi:10.1038/s41467-020-17207-3

Structure-based machine-guided mapping of amyloid sequence space reveals uncharted sequence clusters with higher solubilities

Nat Commun. 2020 Jul 3;11(1):3314. doi: 10.1038/s41467-020-17207-3.

Authors

Nikolaos Louros^{1

2}, Gabriele Orlando^{1

2}, Matthias De Vleeschouwer^{1

2}, Frederic Rousseau^{3

4}, Joost Schymkowitz^{5

6}

Affiliations

¹ Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000, Leuven, Belgium.
² Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000, Leuven, Belgium.
³ Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000, Leuven, Belgium. Frederic.Rousseau@kuleuven.vib.be.
⁴ Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000, Leuven, Belgium. Frederic.Rousseau@kuleuven.vib.be.
⁵ Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000, Leuven, Belgium. Joost.Schymkowitz@kuleuven.vib.be.
⁶ Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000, Leuven, Belgium. Joost.Schymkowitz@kuleuven.vib.be.

Abstract

The amyloid conformation can be adopted by a variety of sequences, but the precise boundaries of amyloid sequence space are still unclear. The currently charted amyloid sequence space is strongly biased towards hydrophobic, beta-sheet prone sequences that form the core of globular proteins and by Q/N/Y rich yeast prions. Here, we took advantage of the increasing amount of high-resolution structural information on amyloid cores currently available in the protein databank to implement a machine learning approach, named Cordax (https://cordax.switchlab.org), that explores amyloid sequence beyond its current boundaries. Clustering by t-Distributed Stochastic Neighbour Embedding (t-SNE) shows how our approach resulted in an expansion away from hydrophobic amyloid sequences towards clusters of lower aliphatic content and higher charge, or regions of helical and disordered propensities. These clusters uncouple amyloid propensity from solubility representing sequence flavours compatible with surface-exposed patches in globular proteins, functional amyloids or sequences associated to liquid-liquid phase transitions.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Amyloid / chemistry*
Amyloid / metabolism
Amyloidogenic Proteins / chemistry*
Amyloidogenic Proteins / metabolism
Amyloidosis / metabolism
Humans
Hydrophobic and Hydrophilic Interactions
Machine Learning
Models, Chemical*
Peptides / chemistry*
Peptides / metabolism
Protein Conformation
Protein Engineering / methods
Solubility

Substances

Amyloid
Amyloidogenic Proteins
Peptides