Precise and parallel characterization of coding polymorphisms, alternative splicing, and modifications in human proteins by mass spectrometry

Mol Cell Proteomics. 2005 Jul;4(7):1002-8. doi: 10.1074/mcp.M500064-MCP200. Epub 2005 Apr 28.

Abstract

The human proteome is a highly complex extension of the genome wherein a single gene often produces distinct protein forms due to alternative splicing, RNA editing, polymorphisms, and posttranslational modifications. Such biological variation compounded by the high sequence identity within gene families currently overwhelms the complete and routine characterization of mammalian proteins by MS. A new data base of human proteins (and their possible variants) was created and searched using tandem mass spectrometric data from intact proteins. This first application of top down MS/MS to wild-type human proteins demonstrates both gene-specific identification and the unambiguous characterization of multifaceted mass shifts (Deltam values). Such Deltam values found from the precise identification of 45 protein forms from HeLa cells reveal 34 coding single nucleotide polymorphisms, two protein forms from alternative splicing, and 12 diverse modifications (not including simple N-terminal processing), including a previously unknown phosphorylation at 10% occupancy. Automated protein identification was achieved with a median expectation value of 10(-13) and often occurred simultaneously with dissection of diverse sources of protein variability as they occur in combination. Top down MS therefore has a bright future for enabling precise annotation of gene products expressed from the human genome by non-mass spectrometrists.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Alternative Splicing*
  • Amino Acid Sequence
  • Animals
  • Computational Biology
  • Databases, Protein
  • HeLa Cells
  • Humans
  • Mass Spectrometry
  • Molecular Sequence Data
  • Nuclear Proteins / analysis*
  • Nuclear Proteins / genetics
  • Phosphoproteins / analysis
  • Phosphoproteins / genetics
  • Polymorphism, Single Nucleotide*
  • Protein Processing, Post-Translational*
  • Proteomics

Substances

  • Nuclear Proteins
  • Phosphoproteins