Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology

K Sjölander; K Karplus; M Brown; R Hughey; A Krogh; I S Mian; D Haussler

doi:10.1093/bioinformatics/12.4.327

Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology

Comput Appl Biosci. 1996 Aug;12(4):327-45. doi: 10.1093/bioinformatics/12.4.327.

Authors

K Sjölander¹, K Karplus, M Brown, R Hughey, A Krogh, I S Mian, D Haussler

Affiliation

¹ Baskin Center for Computer Engineering and Information Sciences, University of California at Santa Cruz 95064, USA. kimmen@cse.ucsc.edu

PMID: 8902360
DOI: 10.1093/bioinformatics/12.4.327

Abstract

We present a method for condensing the information in multiple alignments of proteins into a mixture of Dirichlet densities over amino acid distributions. Dirichlet mixture densities are designed to be combined with observed amino acid frequencies to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model or other statistical model. These estimates give a statistical model greater generalization capacity, so that remotely related family members can be more reliably recognized by the model. This paper corrects the previously published formula for estimating these expected probabilities, and contains complete derivations of the Dirichlet mixture formulas, methods for optimizing the mixtures to match particular databases, and suggestions for efficient implementation.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms
Bayes Theorem
Databases, Factual
Evaluation Studies as Topic
Models, Statistical*
Monte Carlo Method
Probability Theory
Proteins / genetics*
Sequence Alignment / methods*
Sequence Alignment / statistics & numerical data
Sequence Homology, Amino Acid

Substances

Proteins

Grants and funding

GM17129/GM/NIGMS NIH HHS/United States