PREDICT-2ND: a tool for generalized protein local structure prediction

Sol Katzman; Christian Barrett; Grant Thiltgen; Rachel Karchin; Kevin Karplus

doi:10.1093/bioinformatics/btn438

PREDICT-2ND: a tool for generalized protein local structure prediction

Bioinformatics. 2008 Nov 1;24(21):2453-9. doi: 10.1093/bioinformatics/btn438. Epub 2008 Aug 30.

Authors

Sol Katzman¹, Christian Barrett, Grant Thiltgen, Rachel Karchin, Kevin Karplus

Affiliation

¹ Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA.

Abstract

Motivation: Predictions of protein local structure, derived from sequence alignment information alone, provide visualization tools for biologists to evaluate the importance of amino acid residue positions of interest in the absence of X-ray crystal/NMR structures or homology models. They are also useful as inputs to sequence analysis and modeling tools, such as hidden Markov models (HMMs), which can be used to search for homology in databases of known protein structure. In addition, local structure predictions can be used as a component of cost functions in genetic algorithms that predict protein tertiary structure. We have developed a program (predict-2nd) that trains multilayer neural networks and have applied it to numerous local structure alphabets, tuning network parameters such as the number of layers, the number of units in each layer and the window sizes of each layer. We have had the most success with four-layer networks, with gradually increasing window sizes at each layer.

Results: Because the four-layer neural nets occasionally get trapped in poor local optima, our training protocol now uses many different random starts, with short training runs, followed by more training on the best performing networks from the short runs. One recent addition to the program is the option to add a guide sequence to the profile inputs, increasing the number of inputs per position by 20. We find that use of a guide sequence provides a small but consistent improvement in the predictions for several different local-structure alphabets.

Availability: Local structure prediction with the methods described here is available for use online at http://www.soe.ucsc.edu/compbio/SAM_T08/T08-query.html. The source code and example networks for PREDICT-2ND are available at http://www.soe.ucsc.edu/~karplus/predict-2nd/ A required C++ library is available at http://www.soe.ucsc.edu/~karplus/ultimate/

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Amino Acid Sequence
Molecular Sequence Data
Neural Networks, Computer
Protein Conformation*
Proteins / chemistry*
Sequence Alignment
Sequence Analysis, Protein
Software*

Substances

Proteins

Grants and funding

R01 GM068570/GM/NIGMS NIH HHS/United States