Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity

Bioinformatics. 1999 Mar;15(3):180-6. doi: 10.1093/bioinformatics/15.3.180.

Abstract

Motivation: Gene regulation often depends on functional modules which feature a detectable internal organization. Overall sequence similarity of these modules is often insufficient for detection by general search methods like FASTA or even Gapped BLAST. However, it is of interest to evaluate whether modules, often known from experimental analysis of single sequences, are present in other regulatory sequences.

Results: We developed a new method (FastM) which combines a search algorithm for individual transcription factor binding sites (MatInspector) with a distance correlation function. FastM allows fast definition of a model of correlated binding sites derived from as little as a single promoter or enhancer. ModelInspector results are suitable for evaluation of the significance of the model. We used FastM to define a model for the experimentally verified NFkappaB/IRF1 regulatory module from the major histocompatibility complex (MHC) class I HLA-B gene promoter. Analysis of a test set of sequences as well as database searches with this model showed excellent correlation of the model with the biological function of the module. These results could not be obtained by searches using FASTA or Gapped BLAST, which are based on sequence similarity. We were also able to demonstrate association of a hypothetical GRE-GRE module with viral sequences based on analysis of several GenBank sections with this module.

Availability: The WWW version of FastM is accessible at: http://www.gsf.de/cgi-bin/fastm. pl and http://genomatix.gsf.de/cgi-bin/fastm2/fastm.pl

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Binding Sites / genetics
  • DNA / genetics
  • Databases, Factual
  • HLA Antigens / genetics
  • Humans
  • Interferon-beta / genetics
  • Models, Genetic*
  • Molecular Sequence Data
  • Promoter Regions, Genetic*
  • Sequence Alignment / methods*
  • Sequence Alignment / statistics & numerical data
  • Sequence Homology, Nucleic Acid
  • Software*
  • Transcription Factors / metabolism
  • beta 2-Microglobulin / genetics

Substances

  • HLA Antigens
  • Transcription Factors
  • beta 2-Microglobulin
  • Interferon-beta
  • DNA