Phylogenomic analysis of EST datasets

José M Peregrín-Alvarez; John Parkinson

doi:10.1007/978-1-60327-136-3_12

Phylogenomic analysis of EST datasets

Methods Mol Biol. 2009:533:257-76. doi: 10.1007/978-1-60327-136-3_12.

Authors

José M Peregrín-Alvarez¹, John Parkinson

Affiliation

¹ Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, Canada.

PMID: 19277568
DOI: 10.1007/978-1-60327-136-3_12

Abstract

To date the genomes of over 600 organisms have been generated of which 100 are from eukaryotes. Together with partial genome data for an additional 700 eukaryotic organisms, these exceptional sequence resources offer new opportunities to explore phylogenetic relationships and species diversity. The identification of highly diverse sequences specific to an EST-based sequence dataset offers insights into the extent of genetic novelty within that dataset. Sequences that are only shared with other related species from the same taxon might represent genes associated with taxon-specific innovations. On the other hand, sequences that are highly conserved across many other species offer valuable resources for performing more in-depth phylogenetic analyses. In the following chapter, we guide the reader through the process of examining their sequence datasets in the context of phylogenetic relationships. Performed across large-scale datasets, such analyses are termed Phylogenomics. Two complementary approaches are described, both based on the use of BLAST similarity metrics. The first uses an established Java tool - SimiTri - to visualize sequence similarity relationships between the EST dataset and three user-defined datasets. The second focuses on the use of phylogenetic profiles to identify groups of taxonomically related sequences.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Cluster Analysis
Computational Biology / methods*
Computers
Databases, Genetic
Expressed Sequence Tags*
Genomics*
Humans
Phylogeny
Programming Languages
Software
User-Computer Interface