Signal peptides are N-terminal sequences that mediate the targeting and translocation of secreted or cell-surface proteins to the endoplasmic reticulum (ER) membrane. Because of the variability among signal peptides, traditional methods for predicting the effects of an amino acid substitution based on sequence conservation methods may be limited in their use. To address this, we present a scoring function that assesses the effects of an amino acid change within the signal peptide by using data from SignalP, a signal peptide prediction algorithm. Our score incorporates the maximum alterations of the C- and S-scores from SignalP between original and changed versions of the signal peptide. We demonstrate that this metric can discriminate disease-associated mutations from single nucleotide polymorphisms (SNPs) in signal peptides. We further show that polymorphisms with low minor allele frequency (MAF) are more likely to affect the function of the signal peptide. In conjunction with Sorting Intolerant From Tolerant (SIFT), a conservation-based amino acid substitution prediction method, our approach classifies such changes to signal peptides more accurately than other known alternatives, including D-score-based methods. We also examine experimentally characterized mutations and find that our metric minimizes false positives and can predict whether the mutation will affect cleavage or translocation. Finally, we apply our approach to a set of recently produced large-scale cancer somatic mutations from colon and breast cancers and generate a prioritized list of mutations in signal peptides that might impair protein function.
Copyright 2008 Wiley-Liss, Inc.