Identification of novel gene signature for lung adenocarcinoma by machine learning to predict immunotherapy and prognosis

Front Immunol. 2023 Jul 31:14:1177847. doi: 10.3389/fimmu.2023.1177847. eCollection 2023.

Abstract

Background: Lung adenocarcinoma (LUAD) as a frequent type of lung cancer has a 5-year overall survival rate of lower than 20% among patients with advanced lung cancer. This study aims to construct a risk model to guide immunotherapy in LUAD patients effectively.

Materials and methods: LUAD Bulk RNA-seq data for the construction of a model, single-cell RNA sequencing (scRNA-seq) data (GSE203360) for cell cluster analysis, and microarray data (GSE31210) for validation were collected from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) database. We used the Seurat R package to filter and process scRNA-seq data. Sample clustering was performed in the ConsensusClusterPlus R package. Differentially expressed genes (DEGs) between two groups were mined by the Limma R package. MCP-counter, CIBERSORT, ssGSEA, and ESTIMATE were employed to evaluate immune characteristics. Stepwise multivariate analysis, Univariate Cox analysis, and Lasso regression analysis were conducted to identify key prognostic genes and were used to construct the risk model. Key prognostic gene expressions were explored by RT-qPCR and Western blot assay.

Results: A total of 27 immune cell marker genes associated with prognosis were identified for subtyping LUAD samples into clusters C3, C2, and C1. C1 had the longest overall survival and highest immune infiltration among them, followed by C2 and C3. Oncogenic pathways such as VEGF, EFGR, and MAPK were more activated in C3 compared to the other two clusters. Based on the DEGs among clusters, we confirmed seven key prognostic genes including CPA3, S100P, PTTG1, LOXL2, MELTF, PKP2, and TMPRSS11E. Two risk groups defined by the seven-gene risk model presented distinct responses to immunotherapy and chemotherapy, immune infiltration, and prognosis. The mRNA and protein level of CPA3 was decreased, while the remaining six gene levels were increased in clinical tumor tissues.

Conclusion: Immune cell markers are effective in clustering LUAD samples into different subtypes, and they play important roles in regulating the immune microenvironment and cancer development. In addition, the seven-gene risk model may serve as a guide for assisting in personalized treatment in LUAD patients.

Keywords: immune cells; immunotherapy; lung adenocarcinoma; molecular subtyping; risk model; single-cell analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenocarcinoma of Lung* / genetics
  • Adenocarcinoma of Lung* / therapy
  • Humans
  • Immunotherapy
  • Lung Neoplasms* / genetics
  • Lung Neoplasms* / therapy
  • Machine Learning
  • Prognosis
  • Tumor Microenvironment / genetics

Grants and funding

This work was supported by the National Natural Science Foundation of China (82003199 to JS); the Natural Science Foundation of Zhejiang Province (LY22H160013 to JS); the Natural Science Foundation of Ningbo (202003N4205 to JS); the Research Foundation of Ningbo Institute of Life and Health Industry, University of Chinese Academy of Sciences (2020YJY0209 to JS); the Ningbo Health Branding Subject Fund (PPXK2018-05); the Natural Science Foundation of Zhejiang Province (LHDMD23H160001); the Science and technology innovation guidance fund project of Hangzhou Medical College (grant no. CX2022006); the Hwamei Fund (grant no. 2022HMKY37); the Medical Health Science and Technology Project of Zhejiang Provincial Health Commission (Grant No. 2021KY1009 and 2022KY1138); the Key discipline of Hwamei Hospital, University of Chinese Academy of Science (grant no. 2020ZDXK03).