Bayesian proteoform modeling improves protein quantification of global proteomic measurements

Bobbie-Jo M Webb-Robertson; Melissa M Matzke; Susmita Datta; Samuel H Payne; Jiyun Kang; Lisa M Bramer; Carrie D Nicora; Anil K Shukla; Thomas O Metz; Karin D Rodland; Richard D Smith; Mark F Tardiff; Jason E McDermott; Joel G Pounds; Katrina M Waters

doi:10.1074/mcp.M113.030932

Bayesian proteoform modeling improves protein quantification of global proteomic measurements

Mol Cell Proteomics. 2014 Dec;13(12):3639-46. doi: 10.1074/mcp.M113.030932.

Authors

Affiliations

¹ From the ‡Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, WA 99354; bj@pnnl.gov.
² §Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99354;
³ ¶Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202;
⁴ ‖Omics Technology Development and Production, Pacific Northwest National Laboratory, Richland, WA 99354;
⁵ From the ‡Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, WA 99354;
⁶ ¶¶Omics Biological Applications, Pacific Northwest National Laboratory, Richland, WA 99354;
⁷ ‡‡Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354.

Abstract

As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that, with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statistical inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian Proteoform Quantification model (BP-Quant)(1) that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern or the existence of multiple overexpressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab® and R packages.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Alternative Splicing
Amino Acid Sequence
Animals
Bayes Theorem
Blood Proteins / analysis*
Blood Proteins / genetics
Blood Proteins / metabolism
Humans
Mice
Molecular Sequence Data
Protein Processing, Post-Translational*
Proteome / analysis*
Proteome / genetics
Proteome / metabolism
Proteomics / methods
Proteomics / statistics & numerical data*
Software*

Substances

Blood Proteins
Proteome

Abstract

Publication types

MeSH terms

Substances

Grants and funding