Machine learning-driven estimation of mutational burden highlights DNAH5 as a prognostic marker in colorectal cancer

Biol Direct. 2024 Nov 14;19(1):116. doi: 10.1186/s13062-024-00564-0.

Abstract

Background: Tumor Mutational Burden (TMB) have emerged as pivotal predictive biomarkers in determining prognosis and response to immunotherapy in colorectal cancer (CRC) patients. While Whole Exome Sequencing (WES) stands as the gold standard for TMB assessment, carry substantial costs and demand considerable time commitments. Additionally, the heterogeneity among high-TMB patients remains poorly characterized.

Methods: We employed eight advanced machine learning algorithms to develop gene-panel-based models for TMB estimation. To rigorously compare and validate these TMB estimation models, four external cohorts, involving 1,956 patients, were used. Furthermore, we computed the Pearson correlation coefficient between the estimated TMB and tumor neoantigen levels to elucidate their association. CD8+ tumor-infiltrating lymphocyte (TIL) density was assessed via immunohistochemistry.

Results: The TMB estimation model based on the Lasso algorithm, incorporating 20 genes, exhibiting satisfactory performance across multiple independent cohorts (R2 ≥ 0.859). This 20-gene TMB model proved to be an independent prognostic indicator for the progression-free survival (PFS) of CRC patients (p = 0.001). DNAH5 mutations were associated with a more favorable prognosis in high-TMB CRC patients, and correlated strongly with tumor neoantigen levels and CD8+ TIL density.

Conclusions: The 20-gene model offers a cost-efficient approach to precisely estimating TMB, providing prognosis in patients with CRC. Incorporating DNAH5 within this model further refines the categorization of patients with elevated TMB. Utilizing the 20-gene model facilitates the stratification of patients with CRC, enabling more precise treatment planning.

Keywords: Colorectal cancer; Machine learning; Prognostic biomarker; Tumor mutation burden; Tumor neoantigen burden.

MeSH terms

  • Biomarkers, Tumor* / genetics
  • Colorectal Neoplasms* / genetics
  • Female
  • Humans
  • Lymphocytes, Tumor-Infiltrating
  • Machine Learning*
  • Male
  • Middle Aged
  • Mutation*
  • Prognosis

Substances

  • Biomarkers, Tumor