Comparative assessment of performance and genome dependence among phylogenetic profiling methods

BMC Bioinformatics. 2006 Sep 27:7:420. doi: 10.1186/1471-2105-7-420.

Abstract

Background: The rapidly increasing speed with which genome sequence data can be generated will be accompanied by an exponential increase in the number of sequenced eukaryotes. With the increasing number of sequenced eukaryotic genomes comes a need for bioinformatic techniques to aid in functional annotation. Ideally, genome context based techniques such as proximity, fusion, and phylogenetic profiling, which have been so successful in prokaryotes, could be utilized in eukaryotes. Here we explore the application of phylogenetic profiling, a method that exploits the evolutionary co-occurrence of genes in the assignment of functional linkages, to eukaryotic genomes.

Results: In order to evaluate the performance of phylogenetic profiling in eukaryotes, we assessed the relative performance of commonly used profile construction techniques and genome compositions in predicting functional linkages in both prokaryotic and eukaryotic organisms. When predicting linkages in E. coli with a prokaryotic profile, the use of continuous values constructed from transformed BLAST bit-scores performed better than profiles composed of discretized E-values; the use of discretized E-values resulted in more accurate linkages when using S. cerevisiae as the query organism. Extending this analysis by incorporating several eukaryotic genomes in profiles containing a majority of prokaryotes resulted in similar overall accuracy, but with a surprising reduction in pathway diversity among the most significant linkages. Furthermore, the application of phylogenetic profiling using profiles composed of only eukaryotes resulted in the loss of the strong correlation between common KEGG pathway membership and profile similarity score. Profile construction methods, orthology definitions, ontology and domain complexity were explored as possible sources of the poor performance of eukaryotic profiles, but with no improvement in results.

Conclusion: Given the current set of completely sequenced eukaryotic organisms, phylogenetic profiling using profiles generated from any of the commonly used techniques was found to yield extremely poor results. These findings imply genome-specific requirements for constructing functionally relevant phylogenetic profiles, and suggest that differences in the evolutionary history between different kingdoms might generally limit the usefulness of phylogenetic profiling in eukaryotes.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Bacterial Proteins
  • Base Sequence
  • Computational Biology / methods*
  • Databases, Genetic
  • Entropy
  • Escherichia coli / classification
  • Escherichia coli / genetics*
  • Fungal Proteins
  • Genome, Bacterial*
  • Genome, Fungal*
  • Phylogeny*
  • Saccharomyces cerevisiae / classification
  • Saccharomyces cerevisiae / genetics*
  • Sequence Homology

Substances

  • Bacterial Proteins
  • Fungal Proteins