Normalization regarding non-random missing values in high-throughput mass spectrometry data

Pei Wang; Hua Tang; Heidi Zhang; Jeffrey Whiteaker; Amanda G Paulovich; Martin Mcintosh

Normalization regarding non-random missing values in high-throughput mass spectrometry data

Pac Symp Biocomput. 2006:315-26.

Authors

Pei Wang¹, Hua Tang, Heidi Zhang, Jeffrey Whiteaker, Amanda G Paulovich, Martin Mcintosh

Affiliation

¹ Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA. pwang@fhcrc.org

PMID: 17094249

Abstract

We propose a two-step normalization procedure for high-throughput mass spectrometry (MS) data, which is a necessary step in biomarker clustering or classification. First, a global normalization step is used to remove sources of systematic variation between MS profiles due to, for instance, varying amounts of sample degradation over time. A probability model is then used to investigate the intensity-dependent missing events and provides possible substitutions for the missing values. We illustrate the performance of the method with a LC-MS data set of synthetic protein mixtures.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Chromatography, Liquid
Computational Biology
Mass Spectrometry / statistics & numerical data*
Models, Statistical
Probability
Proteins / isolation & purification

Substances

Proteins

Grants and funding

CA86368/CA/NCI NIH HHS/United States