State-of-the-art mass spectrometers combined with modern bioinformatics algorithms for peptide-to-spectrum matching (PSM) with robust statistical scoring allow for more variable features (i.e., post-translational modifications) being reliably identified from (tandem-) mass spectrometry data, often without the need for biochemical enrichment. Semi-specific proteome searches, that enforce a theoretical enzymatic digestion to solely the N- or C-terminal end, allow to identify of native protein termini or those arising from endogenous proteolytic activity (also referred to as "neo-N-termini" analysis or "N-terminomics"). Nevertheless, deriving biological meaning from these search outputs can be challenging in terms of data mining and analysis. Thus, we introduce TermineR, a data analysis approach for the (1) annotation of peptides according to their enzymatic cleavage specificity and known protein processing features, (2) differential abundance and enrichment analysis of N-terminal sequence patterns, and (3) visualization of neo-N-termini location. We illustrate the use of TermineR by applying it to tandem mass tag (TMT)-based proteomics data of a mouse model of polycystic kidney disease, and assess the semi-specific searches for biological interpretation of cleavage events and the variable contribution of proteolytic products to general protein abundance. The TermineR approach and example data are available as an R package at https://github.com/MiguelCos/TermineR.
Keywords: data processing; polycystic kidney disease; proteolysis; terminomics.
© 2024 The Author(s). PROTEOMICS published by Wiley‐VCH GmbH.