A statistical framework to discover true associations from multiprotein complex pull-down proteomics data sets

Proteins. 2006 Aug 1;64(2):436-43. doi: 10.1002/prot.20994.

Abstract

Experimental processes to collect and process proteomics data are increasingly complex, and the computational methods to assess the quality and significance of these data remain unsophisticated. These challenges have led to many biological oversights and computational misconceptions. We developed an empirical Bayes model to analyze multiprotein complex (MPC) proteomics data derived from peptide mass spectrometry detections of purified protein complex pull-down experiments. Using our model and two yeast proteomics data sets, we estimated that there should be an average of about 20 true associations per MPC, almost 10 times as high as was previously estimated. For data sets generated to mimic a real proteome, our model achieved on average 80% sensitivity in detecting true associations, as compared with the 3% sensitivity in previous work, while maintaining a comparable false discovery rate of 0.3%. Cross-examination of our results with protein complexes confirmed by various experimental techniques demonstrates that many true associations that cannot be identified by previous approach are identified by our method.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Databases, Protein
  • False Positive Reactions
  • Fungal Proteins / chemistry
  • Models, Statistical
  • Multiprotein Complexes
  • Protein Conformation
  • Proteins / chemistry
  • Proteome
  • Proteomics / methods*
  • Reproducibility of Results
  • Saccharomyces cerevisiae / metabolism
  • Sensitivity and Specificity

Substances

  • Fungal Proteins
  • Multiprotein Complexes
  • Proteins
  • Proteome