Molecular cause of human disease retains as one of the most attractive scientific research targets for decades. An effective approach toward this topic is analysis and identification of disease-related amino acid polymorphisms. In this work, we developed a concise and promising deleterious amino acid polymorphism identification method SeqSubPred based on 44 features solely extracted from protein sequence. SeqSubPred achieved surprisingly good predictive ability with accuracy (0.88) and area under receiver operating characteristic (0.94) without resorting to homology or evolution information, which is frequently used in similar methods and usually more complex and time-consuming. SeqSubPred also identified several critical sequence features obtained from random forests model, and these features brought some interesting insights into the factors affecting human disease-related amino acid substitutions. The online version of SeqSubPred method is available at montana.informatics.indiana.edu/cgi-bin/seqmut/seqsubpred.cgi
Copyright © 2010 Wiley Periodicals, Inc.