Less is more: relative rank is more informative than absolute abundance for compositional NGS data

Xubin Zheng; Nana Jin; Qiong Wu; Ning Zhang; Haonan Wu; Yuanhao Wang; Rui Luo; Tao Liu; Wanfu Ding; Qingshan Geng; Lixin Cheng

doi:10.1093/bfgp/elae045

Less is more: relative rank is more informative than absolute abundance for compositional NGS data

Brief Funct Genomics. 2024 Nov 20:elae045. doi: 10.1093/bfgp/elae045. Online ahead of print.

Authors

Xubin Zheng^{1

2

3}, Nana Jin^{1

2}, Qiong Wu⁴, Ning Zhang^{1

2}, Haonan Wu^{1

2}, Yuanhao Wang^{1

2}, Rui Luo⁵, Tao Liu⁶, Wanfu Ding¹, Qingshan Geng¹, Lixin Cheng^{1

2}

Affiliations

¹ Guangdong Provincial Clinical Research Center for Geriatrics, Shenzhen Clinical Research Center for Geriatrics, Shenzhen People's Hospital, Luohu District, Shenzhen 518020, China.
² Health Data Science Center, Shenzhen People's Hospital (First Affiliated Hospital of Southern University of Science and Technology), Luohu District, Shenzhen 518020, China.
³ School of Computing and Information Technology, Great Bay University, Dongguan 523000, Guangdong, China.
⁴ School of Basic Medicine, North Sichuan Medical College, Nanchong 637000, Sichuan, China.
⁵ Department of Systems Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR.
⁶ International Digital Economy Academy (IDEA), Futian District, Shenzhen 518020, China.

PMID: 39568388
DOI: 10.1093/bfgp/elae045

Abstract

High-throughput gene expression data have been extensively generated and utilized in biological mechanism investigations, biomarker detection, disease diagnosis and prognosis. These applications encompass not only bulk transcriptome, but also single cell RNA-seq data. However, extracting reliable biological information from transcriptome data remains challenging due to the constrains of Compositional Data Analysis. Current data preprocessing methods, including dataset normalization and batch effect correction, are insufficient to address these issues and improve data quality for downstream analysis. Alternatively, qualification methods focusing on the relative order of gene expression (ROGER) are more informative than the quantification methods that rely on gene expression abundance. The Pairwise Analysis of Gene expression method is an enhancement of ROGER, designed for data integration in either sample space or feature space. In this review, we summarize the methods applied to transcriptome data analysis and discuss their potentials in predicting clinical outcomes.

Keywords: compositional data analysis; data integration; pairwise analysis; relative expression; transcriptome.

Abstract

Grants and funding