Oral delivery of peptide therapeutics is limited by degradation by gut proteases like chymotrypsin. Existing databases of peptidases are limited in size and do not enable systematic analyses of protease substrate preferences, especially for non-natural amino acids. Thus, stability optimization of hit compounds is time and resource intensive. To accelerate the stability optimization of peptide ligands, we generated large datasets of chymotrypsin-resistant peptides via mRNA display to create a predictive model for chymotrypsin-resistant sequences. Through analysis of enriched motifs, we recapitulate known chymotrypsin cleavage sites, reveal positionally dependent effects of monomers on peptide cleavage, and report previously unidentified protective and destabilizing residues. We then developed a machine-learning-based model predicting peptide resistance to chymotrypsin cleavage and validated both model performance and the NGS experimental data by measuring chymotrypsin half-lives for a subset of peptides. Finally, we simulated stability predictions on non-natural amino acids through a leucine hold-out model and observed robust performance. Overall, we demonstrate the utility of mRNA display as a tool for big data generation and show that pairing mRNA display with machine learning yields valuable predictions for chymotrypsin cleavage. Expansion of this workflow to additional proteases could provide complementary predictive models that focus future peptide drug discovery efforts.
Keywords: Peptides; cheminformatics; enzymes; mRNA; machine learning.
© 2024 Wiley‐VCH GmbH.