In silico prediction of scaffold/matrix attachment regions in large genomic sequences

Genome Res. 2002 Feb;12(2):349-54. doi: 10.1101/gr.206602.

Abstract

Scaffold/matrix attachment regions (S/MARs) are essential regulatory DNA elements of eukaryotic cells. They are major determinants of locus control of gene expression and can shield gene expression from position effects. Experimental detection of S/MARs requires substantial effort and is not suitable for large-scale screening of genomic sequences. In silico prediction of S/MARs can provide a crucial first selection step to reduce the number of candidates. We used experimentally defined S/MAR sequences as the training set and generated a library of new S/MAR-associated, AT-rich patterns described as weight matrices. A new tool called SMARTest was developed that identifies potential S/MARs by performing a density analysis based on the S/MAR matrix library (http://www.genomatix.de/cgi-bin/smartest_pd/smartest.pl). S/MAR predictions were evaluated by using six genomic sequences from animal and plant for which S/MARs and non-S/MARs were experimentally mapped. SMARTest reached a sensitivity of 38% and a specificity of 68%. In contrast to previous algorithms, the SMARTest approach does not depend on the sequence context and is suitable to analyze long genomic sequences up to the size of whole chromosomes. To demonstrate the feasibility of large-scale S/MAR prediction, we analyzed the recently published chromosome 22 sequence and found 1198 S/MAR candidates.

Publication types

  • Comparative Study
  • Evaluation Study

MeSH terms

  • Algorithms
  • Animals
  • Binding Sites / genetics
  • Chickens
  • Computational Biology / methods*
  • DNA / genetics*
  • DNA / metabolism
  • DNA, Plant / genetics
  • DNA-Binding Proteins / genetics
  • DNA-Binding Proteins / metabolism
  • Databases, Genetic
  • Humans
  • Mice
  • Nuclear Matrix / genetics*
  • Nuclear Matrix / metabolism
  • Nuclear Proteins / genetics
  • Nuclear Proteins / metabolism
  • Oryza / genetics
  • Software

Substances

  • DNA, Plant
  • DNA-Binding Proteins
  • Nuclear Proteins
  • DNA