Occurrence probability of structured motifs in random sequences

J Comput Biol. 2002;9(6):761-73. doi: 10.1089/10665270260518254.

Abstract

The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations.

MeSH terms

  • Algorithms
  • Bacillus subtilis / genetics
  • Base Sequence*
  • Computational Biology*
  • Computer Simulation
  • Escherichia coli / genetics
  • Models, Genetic
  • Probability
  • Promoter Regions, Genetic
  • Transcription, Genetic*