Integrated sequence-structure motifs suffice to identify microRNA precursors

PLoS One. 2012;7(3):e32797. doi: 10.1371/journal.pone.0032797. Epub 2012 Mar 15.

Abstract

Background: Upwards of 1200 miRNA loci have hitherto been annotated in the human genome. The specific features defining a miRNA precursor and deciding its recognition and subsequent processing are not yet exhaustively described and miRNA loci can thus not be computationally identified with sufficient confidence.

Results: We rendered pre-miRNA and non-pre-miRNA hairpins as strings of integrated sequence-structure information, and used the software Teiresias to identify sequence-structure motifs (ss-motifs) of variable length in these data sets. Using only ss-motifs as features in a Support Vector Machine (SVM) algorithm for pre-miRNA identification achieved 99.2% specificity and 97.6% sensitivity on a human test data set, which is comparable to previously published algorithms employing combinations of sequence-structure and additional features. Further analysis of the ss-motif information contents revealed strongly significant deviations from those of the respective training sets, revealing important potential clues as to how the sequence and structural information of RNA hairpins are utilized by the miRNA processing apparatus.

Conclusion: Integrated sequence-structure motifs of variable length apparently capture nearly all information required to distinguish miRNA precursors from other stem-loop structures.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Databases, Nucleic Acid / statistics & numerical data
  • Genome, Human
  • Humans
  • MicroRNAs / chemistry*
  • MicroRNAs / genetics*
  • MicroRNAs / metabolism
  • Nucleic Acid Conformation
  • RNA Processing, Post-Transcriptional
  • Software
  • Support Vector Machine

Substances

  • MicroRNAs