Identification of a three-gene expression signature and construction of a prognostic nomogram predicting overall survival in lung adenocarcinoma based on TCGA and GEO databases

Transl Lung Cancer Res. 2022 Jul;11(7):1479-1496. doi: 10.21037/tlcr-22-444.

Abstract

Background: Lung adenocarcinoma (LUAD) is the major cause of cancer mortality. Traditional prognostic factors have limited importance after including other parameters. Thus, developing a more credible prognostic model combined with genes and clinical parameters is necessary.

Methods: The messenger RNA (mRNA) expression and clinical information from The Cancer Genome Atlas (TCGA)-LUAD datasets and microarray data from three Gene Expression Omnibus (GEO) databases were obtained. We identified differentially-expressed genes (DEGs) between lung tumor and normal tissues through integrated analysis of the three GEO datasets. Univariate and multivariate Cox regression analyses were conducted to select survival-associated DEGs and to establish a prognostic gene signature which was associated with overall survival (OS). The expression of gene proteins was assessed in 180 LUAD tissue microarrays (TMAs) by immunohistochemistry (IHC). We verified its predictive performance with a Kaplan-Meier (KM) curve, receiver operating characteristic (ROC) curve, and Harrell's concordance index (C-index) and validated it in external GEO databases. Multivariate Cox regression analysis was performed to identify the significant prognostic indicators in LUAD. Furthermore, we established a prognostic nomogram based on TCGA-LUAD dataset.

Results: A three-gene signature was constructed to predict the OS of LUAD patients. The KM analysis, ROC curve, and C-index present a good predictive ability of the gene signature in TCGA dataset [P<0.0001; C-index 0.6375; 95% confidence interval (CI): 0.5632-0.7118; area under the ROC curve (AUC) 0.674] and the external GEO datasets (P=0.05, 0.004, and 0.04, respectively). Univariate and multivariate Cox regression analyses also verified that LUAD patients with low-risk scores had a decreased risk of death compared to those with a high-risk score in TCGA database [hazard ratio (HR) =0.3898; 95% CI: 0.1938-0.7842; P<0.05]. Finally, we constructed a nomogram integrating the gene signature and clinicopathological parameters (P<0.0001; C-index 0.762; 95% CI: 0.714-0.845; AUC 0.8136). Compared with conventional staging, a nomogram can effectively improve prognosis prediction.

Conclusions: The nomogram is closely associated to the OS of LUAD patients. This consequence may be beneficial to individualized treatment and clinical decision-making.

Keywords: Gene Expression Omnibus (GEO); The Cancer Genome Atlas (TCGA); lung adenocarcinoma (LUAD); overall survival (OS).