EbEST: an automated tool using expressed sequence tags to delineate gene structure

Genome Res. 1998 Mar;8(3):268-75. doi: 10.1101/gr.8.3.268.

Abstract

Large numbers of expressed sequence tags (ESTs) continue to fill public and private databases with partial cDNA sequences. However, using this huge amount of ESTs to facilitate gene finding in genomic sequence imposes a challenge, especially to wet-lab scientists who often have limited computing resources. In an effort to consolidate the information hidden in the vast number of ESTs into a readable and manageable format, we have developed EbEST-a program that automates the process of using ESTs to help delineate gene structure in long stretches of genomic sequence. The EbEST program consists of three functional modules-the first module separates homologous ESTs into clusters and identifies the most informative ESTs within each cluster; the second module uses the informative ESTs to perform gapped alignment and to predict the exon-intron boundary; and the third module generates text file and graphic outputs that illustrate the orientation, exonic structure, and untranslated regions (UTRs) of putative genes in the genomic sequence being analyzed. Evaluation of EbEST with 176 human genes from the ALLSEQ set indicated that it performed in-line with several existing gene finding programs, but was more tolerant to sequencing errors. Furthermore, when EbEST was challenged with query sequences that harbor more than one gene, it suffered only a slight drop in performance, whereas the performance of the other programs evaluated decreased more. EbEST may be used as a stand-alone tool to annotate human genomic sequences with EST-derived gene elements, or can be used in conjunction with computational gene-recognition programs to increase the accuracy of gene prediction. [EbBEST is available at http://EbEST.ifrc.mcw.edu]

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Base Sequence*
  • Computational Biology / methods
  • Exons
  • Gene Expression*
  • Genome, Human
  • Humans
  • Regulatory Sequences, Nucleic Acid
  • Software*