Decoding glycosylation potential from protein structure across human glycoproteins with a multi-view recurrent neural network

bioRxiv [Preprint]. 2024 May 23:2024.05.15.594334. doi: 10.1101/2024.05.15.594334.

Abstract

Glycosylation is described as a non-templated biosynthesis. Yet, the template-free premise is antithetical to the observation that different N-glycans are consistently placed at specific sites. It has been proposed that glycosite-proximal protein structures could constrain glycosylation and explain the observed microheterogeneity. Using site-specific glycosylation data, we trained a hybrid neural network to parse glycosites (recurrent neural network) and match them to feasible N-glycosylation events (graph neural network). From glycosite-flanking sequences, the algorithm predicts most human N-glycosylation events documented in the GlyConnect database and proposed structures corresponding to observed monosaccharide composition of the glycans at these sites. The algorithm also recapitulated glycosylation in Enhanced Aromatic Sequons, SARS-CoV-2 spike, and IgG3 variants, thus demonstrating the ability of the algorithm to predict both glycan structure and abundance. Thus, protein structure constrains glycosylation, and the neural network enables predictive in silico glycosylation of uncharacterized or novel protein sequences and genetic variants.

Publication types

  • Preprint