AMDBNorm: an approach based on distribution adjustment to eliminate batch effects of gene expression data

Xu Zhang; Zhiqiang Ye; Jing Chen; Feng Qiao

doi:10.1093/bib/bbab528

AMDBNorm: an approach based on distribution adjustment to eliminate batch effects of gene expression data

Brief Bioinform. 2022 Jan 17;23(1):bbab528. doi: 10.1093/bib/bbab528.

Authors

Xu Zhang¹, Zhiqiang Ye², Jing Chen³, Feng Qiao⁴

Affiliations

¹ School of Mathematics and Statistics, Southwest University, China.
² Chongqing Normal University, China.
³ School of Science, Southwest University of Science and Technology, China.
⁴ Southwest University, China.

PMID: 34958674
DOI: 10.1093/bib/bbab528

Abstract

Batch effects explain a large part of the noise when merging gene expression data. Removing irrelevant variations introduced by batch effects plays an important role in gene expression studies. To obtain reliable differential analysis results, it is necessary to remove the variation caused by technical conditions between different batches while preserving biological variation. Usually, merging data directly with batch effects leads to a sharp rise in false positives. Although some methods of batch correction have been developed, they have some drawbacks. In this study, we develop a new algorithm, adjustment mean distribution-based normalization (AMDBNorm), which is based on a probability distribution to correct batch effects while preserving biological variation. AMDBNorm solves the defects of the existing batch correction methods. We compared several popular methods of batch correction with AMDBNorm using two real gene expression datasets with batch effects and analyzed the results of batch correction from the visual and quantitative perspectives. To ensure the biological variation was well protected, the effects of the batch correction methods were verified by hierarchical cluster analysis. The results showed that the AMDBNorm algorithm could remove batch effects of gene expression data effectively and retain more biological variation than other methods. Our approach provides the researchers with reliable data support in the study of differential gene expression analysis and prognostic biomarker selection.

Keywords: batch effects; biological variation; biomarker; distribution alignment; gene expression data; mean adjustment.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Biomarkers
Cluster Analysis
Deep Learning
Gene Expression Profiling / methods*
Gene Expression*
Humans
Neoplasms / genetics
Reproducibility of Results
Software*

Substances

Biomarkers