Correlation test to assess low-level processing of high-density oligonucleotide microarray data

Alexander Ploner; Lance D Miller; Per Hall; Jonas Bergh; Yudi Pawitan

doi:10.1186/1471-2105-6-80

Correlation test to assess low-level processing of high-density oligonucleotide microarray data

BMC Bioinformatics. 2005 Mar 31:6:80. doi: 10.1186/1471-2105-6-80.

Authors

Alexander Ploner¹, Lance D Miller, Per Hall, Jonas Bergh, Yudi Pawitan

Affiliation

¹ Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. Alexander.Ploner@meb.ki.se

Abstract

Background: There are currently a number of competing techniques for low-level processing of oligonucleotide array data. The choice of technique has a profound effect on subsequent statistical analyses, but there is no method to assess whether a particular technique is appropriate for a specific data set, without reference to external data.

Results: We analyzed coregulation between genes in order to detect insufficient normalization between arrays, where coregulation is measured in terms of statistical correlation. In a large collection of genes, a random pair of genes should have on average zero correlation, hence allowing a correlation test. For all data sets that we evaluated, and the three most commonly used low-level processing procedures including MAS5, RMA and MBEI, the housekeeping-gene normalization failed the test. For a real clinical data set, RMA and MBEI showed significant correlation for absent genes. We also found that a second round of normalization on the probe set level improved normalization significantly throughout.

Conclusion: Previous evaluation of low-level processing in the literature has been limited to artificial spike-in and mixture data sets. In the absence of a known gold-standard, the correlation criterion allows us to assess the appropriateness of low-level processing of a specific data set and the success of normalization for subsets of genes.

MeSH terms

Algorithms
Cluster Analysis
Computational Biology
DNA Probes
Data Interpretation, Statistical
Databases, Genetic
Gene Expression Profiling*
Gene Expression Regulation
Gene Expression Regulation, Neoplastic
Humans
Models, Genetic
Models, Statistical
Oligonucleotide Array Sequence Analysis / methods*
Oligonucleotides / chemistry
RNA Probes
Research Design
Sensitivity and Specificity
Software

Substances

DNA Probes
Oligonucleotides
RNA Probes