Pathways-based clustering methods have been proposed to explore tumor heterogeneity. However, such methods are currently disadvantageous in that specific pathways need to be explicitly claimed. We developed the PathClustNet algorithm, a pathway-based clustering method designed to identify cancer subtypes. This method first detects gene clusters and identifies overrepresented pathways associated with them. Based on the pathway enrichment scores, it reveals cancer subtypes by clustering analysis. We applied the method to TCGA pan-cancer data and identified four pan-cancer subtypes, termed C1, C2, C3 and C4. C1 exhibited high metabolic activity, favorable survival, and the lowest TP53 mutation rate. C2 had high immune, developmental, and stromal pathway activities, the lowest tumor purity, and intratumor heterogeneity. C3, which overexpressed cell cycle and DNA repair pathways, was the most genomically unstable and had the highest TP53 mutation rate. C4 overrepresented neuronal pathways, with the lowest response rate to chemotherapy, but the highest tumor purity and genomic stability. Furthermore, age showed positive correlations with most pathways but a negative correlation with neuronal pathways. Smoking, viral infections, and alcohol use were found to affect the activities of neuron, cell cycle, immune, stromal, developmental, and metabolic pathway in varying degrees. The PathClustNet algorithm unveils a novel classification of pan-cancer based on metabolic, immune, stromal, developmental, cell cycle, and neuronal pathways. These subtypes display different molecular and clinical features to warrant the investigation of precision oncology.
Keywords: Gene clusters; Pan-cancer; PathClustNet algorithm; Pathway enrichment analysis; Subtyping; Tumor heterogeneity.
© 2024. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.