Machine learning combining multi-omics data and network algorithms identifies adrenocortical carcinoma prognostic biomarkers

Front Mol Biosci. 2023 Nov 6:10:1258902. doi: 10.3389/fmolb.2023.1258902. eCollection 2023.

Abstract

Background: Rare endocrine cancers such as Adrenocortical Carcinoma (ACC) present a serious diagnostic and prognostication challenge. The knowledge about ACC pathogenesis is incomplete, and patients have limited therapeutic options. Identification of molecular drivers and effective biomarkers is required for timely diagnosis of the disease and stratify patients to offer the most beneficial treatments. In this study we demonstrate how machine learning methods integrating multi-omics data, in combination with system biology tools, can contribute to the identification of new prognostic biomarkers for ACC. Methods: ACC gene expression and DNA methylation datasets were downloaded from the Xena Browser (GDC TCGA Adrenocortical Carcinoma cohort). A highly correlated multi-omics signature discriminating groups of samples was identified with the data integration analysis for biomarker discovery using latent components (DIABLO) method. Additional regulators of the identified signature were discovered using Clarivate CBDD (Computational Biology for Drug Discovery) network propagation and hidden nodes algorithms on a curated network of molecular interactions (MetaBase™). The discriminative power of the multi-omics signature and their regulators was delineated by training a random forest classifier using 55 samples, by employing a 10-fold cross validation with five iterations. The prognostic value of the identified biomarkers was further assessed on an external ACC dataset obtained from GEO (GSE49280) using the Kaplan-Meier estimator method. An optimal prognostic signature was finally derived using the stepwise Akaike Information Criterion (AIC) that allowed categorization of samples into high and low-risk groups. Results: A multi-omics signature including genes, micro RNA's and methylation sites was generated. Systems biology tools identified additional genes regulating the features included in the multi-omics signature. RNA-seq, miRNA-seq and DNA methylation sets of features revealed a high power to classify patients from stages I-II and stages III-IV, outperforming previously identified prognostic biomarkers. Using an independent dataset, associations of the genes included in the signature with Overall Survival (OS) data demonstrated that patients with differential expression levels of 8 genes and 4 micro RNA's showed a statistically significant decrease in OS. We also found an independent prognostic signature for ACC with potential use in clinical practice, combining 9-gene/micro RNA features, that successfully predicted high-risk ACC cancer patients. Conclusion: Machine learning and integrative analysis of multi-omics data, in combination with Clarivate CBDD systems biology tools, identified a set of biomarkers with high prognostic value for ACC disease. Multi-omics data is a promising resource for the identification of drivers and new prognostic biomarkers in rare diseases that could be used in clinical practice.

Keywords: ACC; machine learning; multi-omics; prognostic biomarkers; survival analysis; systems biology.

Grants and funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.