ecGBMsub: an integrative stacking ensemble model framework based on eccDNA molecular profiling for improving IDH wild-type glioblastoma molecular subtype classification

Front Pharmacol. 2024 Apr 11:15:1375112. doi: 10.3389/fphar.2024.1375112. eCollection 2024.

Abstract

IDH wild-type glioblastoma (GBM) intrinsic subtypes have been linked to different molecular landscapes and outcomes. Accurate prediction of molecular subtypes of GBM is very important to guide clinical diagnosis and treatment. Leveraging machine learning technology to improve the subtype classification was considered a robust strategy. Several single machine learning models have been developed to predict survival or stratify patients. An ensemble learning strategy combines several basic learners to boost model performance. However, it still lacked a robust stacking ensemble learning model with high accuracy in clinical practice. Here, we developed a novel integrative stacking ensemble model framework (ecGBMsub) for improving IDH wild-type GBM molecular subtype classification. In the framework, nine single models with the best hyperparameters were fitted based on extrachromosomal circular DNA (eccDNA) molecular profiling. Then, the top five optimal single models were selected as base models. By randomly combining the five optimal base models, 26 different combinations were finally generated. Nine different meta-models with the best hyperparameters were fitted based on the prediction results of 26 different combinations, resulting in 234 different stacked ensemble models. All models in ecGBMsub were comprehensively evaluated and compared. Finally, the stacking ensemble model named "XGBoost.Enet-stacking-Enet" was chosen as the optimal model in the ecGBMsub framework. A user-friendly web tool was developed to facilitate accessibility to the XGBoost.Enet-stacking-Enet models (https://lizesheng20190820.shinyapps.io/ecGBMsub/).

Keywords: IDH wild-type glioblastoma; brain tumor; eccDNA; ensemble model; machine learning; molecular subtype.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The project was supported by grants (No. 81773187; No. 82273493) from the National Nature Science Foundation of China. The project was sponsored by the Tianjin Health Research Project (TJWJ2023ZD001) and the Natural Science Foundation of Henan Province for Excellent Young Scholars (No. 232300421057). The project was also supported by the Ji nan Clinical Medical Science and Technology Innovation Plan project (202134059).