Analysis and Prediction of Chymotrypsin Substrate Preferences through Large Data Acquisition with Target-Free mRNA Display

Dan Sindhikara; Sabrina E Iskandar; Lindsey Guan; Rumit Maini; Christopher J Hipolito; Congliang Sun; Lisa A Vasicek; Adam Weinglass; S Adrian Saldanha

doi:10.1002/cbic.202400760

Analysis and Prediction of Chymotrypsin Substrate Preferences through Large Data Acquisition with Target-Free mRNA Display

Chembiochem. 2024 Nov 15:e202400760. doi: 10.1002/cbic.202400760. Online ahead of print.

Authors

Dan Sindhikara¹, Sabrina E Iskandar², Lindsey Guan³, Rumit Maini⁴, Christopher J Hipolito⁵, Congliang Sun⁶, Lisa A Vasicek⁶, Adam Weinglass⁷, S Adrian Saldanha⁷

Affiliations

¹ Merck & Co Inc, Modeling and Informatics, 2025 E Scott Ave, 07065, Rahway, UNITED STATES OF AMERICA.
² Merck & Co Inc, Screening and Compound Profiling, 07065, Rahway, UNITED STATES OF AMERICA.
³ Merck & Co Inc, Modeling and Informatics, 07065, Rahway, UNITED STATES OF AMERICA.
⁴ Eli Lilly and Company Biotechnology Center San Diego, Peptide Discovery, UNITED STATES OF AMERICA.
⁵ Merck Pharmaceuticals, Screening and Compound Profiling, Quantitative Biosciences, UNITED STATES OF AMERICA.
⁶ Merck and Co Inc West Point, Pharmacokinetics, Dynamics, Metabolism, and Bioanalytics, UNITED STATES OF AMERICA.
⁷ Merck & Co Inc, Screening and Compound Profiling, Quantitative Biosciences, UNITED STATES OF AMERICA.

PMID: 39547944
DOI: 10.1002/cbic.202400760

Abstract

Oral delivery of peptide therapeutics is limited by degradation by gut proteases like chymotrypsin. Existing databases of peptidases are limited in size and do not enable systematic analyses of protease substrate preferences, especially for non-natural amino acids. Thus, stability optimization of hit compounds is time and resource intensive. To accelerate the stability optimization of peptide ligands, we generated large datasets of chymotrypsin-resistant peptides via mRNA display to create a predictive model for chymotrypsin-resistant sequences. Through analysis of enriched motifs, we recapitulate known chymotrypsin cleavage sites, reveal positionally dependent effects of monomers on peptide cleavage, and report previously unidentified protective and destabilizing residues. We then developed a machine-learning-based model predicting peptide resistance to chymotrypsin cleavage and validated both model performance and the NGS experimental data by measuring chymotrypsin half-lives for a subset of peptides. Finally, we simulated stability predictions on non-natural amino acids through a leucine hold-out model and observed robust performance. Overall, we demonstrate the utility of mRNA display as a tool for big data generation and show that pairing mRNA display with machine learning yields valuable predictions for chymotrypsin cleavage. Expansion of this workflow to additional proteases could provide complementary predictive models that focus future peptide drug discovery efforts.

Keywords: Peptides; cheminformatics; enzymes; mRNA; machine learning.