Protein family classification based on searching a database of blocks

Genomics. 1994 Jan 1;19(1):97-107. doi: 10.1006/geno.1994.1018.

Abstract

The most highly conserved regions of proteins can be represented as "blocks" of locally aligned sequence segments. Previously, an automated system was introduced to generate a database of blocks that is searched for local similarities using a sequence query. Here, we describe a method for searching this database that can also reveal significant global similarities. Local and global alignments are scored independently, so they can be used in concert to infer homology. A set of 7082 diverse sequences not represented in the database provided queries for testing this approach. The resulting distributions of scores led to guidelines for interpretation of search data and to the classification of 289 uncatalogued sequences into known groups. Thirty-eight of these relationships appear to be new discoveries. We also show how searching a database of blocks can be used to detect repeated domains and to find distinct cross-family relationships that were missed in searches of sequence databases.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • DNA Helicases / chemistry
  • Databases, Factual*
  • Mammals / genetics
  • Proteins / chemistry
  • Proteins / classification*
  • Repetitive Sequences, Nucleic Acid
  • Saccharomyces cerevisiae / genetics
  • Sequence Alignment*
  • Sequence Homology, Amino Acid*
  • Software

Substances

  • Proteins
  • DNA Helicases