Large-scale analysis of non-synonymous coding region single nucleotide polymorphisms

Bioinformatics. 2004 May 1;20(7):1006-14. doi: 10.1093/bioinformatics/bth029. Epub 2004 Jan 29.

Abstract

Motivation: Single nucleotide polymorphisms (SNPs) are the most common form of genetic variant in humans. SNPs causing amino acid substitutions are of particular interest as candidates for loci affecting susceptibility to complex diseases, such as diabetes and hypertension. To efficiently screen SNPs for disease association, it is important to distinguish neutral variants from deleterious ones.

Results: We describe the use of Pfam protein motif models and the HMMER program to predict whether amino acid changes in conserved domains are likely to affect protein function. We find that the magnitude of the change in the HMMER E-value caused by an amino acid substitution is a good predictor of whether it is deleterious. We provide internet-accessible display tools for a genomewide collection of SNPs, including 7391 distinct non-synonymous coding region SNPs in 2683 genes.

Availability: http://lpgws.nci.nih.gov/cgi-bin/GeneViewer.cgi

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Amino Acid Motifs
  • Computer Simulation
  • Databases, Protein
  • Gene Expression Profiling / methods*
  • Models, Molecular
  • Open Reading Frames / genetics
  • Polymorphism, Single Nucleotide / genetics*
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / genetics*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid
  • Software*
  • Structure-Activity Relationship
  • User-Computer Interface

Substances

  • Proteins