An ensemble-based strategy for robust predictive volcanic rock typing efficiency on a global-scale: A novel workflow driven by big data analytics

Umar Ashraf; Hucai Zhang; Aqsa Anees; Muhammad Ali; Hassan Nasir Mangi; Xiaonan Zhang

doi:10.1016/j.scitotenv.2024.173425

An ensemble-based strategy for robust predictive volcanic rock typing efficiency on a global-scale: A novel workflow driven by big data analytics

Sci Total Environ. 2024 Aug 10:937:173425. doi: 10.1016/j.scitotenv.2024.173425. Epub 2024 May 24.

Authors

Umar Ashraf¹, Hucai Zhang², Aqsa Anees³, Muhammad Ali⁴, Hassan Nasir Mangi⁵, Xiaonan Zhang⁶

Affiliations

¹ Institute of International Rivers and Eco-Security, Yunnan University, Kunming 650500, China; Institute for Ecological Research and Pollution Control of Plateau Lakes, School of Ecology and Environmental Science, Yunnan University, Kunming 650500, China.
² Institute for Ecological Research and Pollution Control of Plateau Lakes, School of Ecology and Environmental Science, Yunnan University, Kunming 650500, China. Electronic address: zhanghc@ynu.edu.cn.
³ Institute of International Rivers and Eco-Security, Yunnan University, Kunming 650500, China; Institute for Ecological Research and Pollution Control of Plateau Lakes, School of Ecology and Environmental Science, Yunnan University, Kunming 650500, China. Electronic address: aqsaanees@ynu.edu.cn.
⁴ Institute of Geophysics and Geomatics, China University of Geosciences, Wuhan 430074, China.
⁵ School of Mines, China University of Mining and Technology, Xuzhou 221116, China.
⁶ Institute for Ecological Research and Pollution Control of Plateau Lakes, School of Ecology and Environmental Science, Yunnan University, Kunming 650500, China.

PMID: 38795994
DOI: 10.1016/j.scitotenv.2024.173425

Abstract

Laboratory measurements, paleontological data, and well-logs are often used to conduct mineralogical and chemical analyses to classify rock samples. Employing digital intelligence techniques may enhance the accuracy of classification predictions while simultaneously speeding up the whole classification process. We aim to develop a comprehensive approach for categorizing igneous rock types based on their global geochemical characteristics. Our strategy integrates advanced clustering, classification, data mining, and statistical methods employing worldwide geochemical data set of ~25,000 points from 15 igneous rock types. In this pioneering study, we employed hierarchical clustering, linear projection analysis, and multidimensional scaling to determine the frequency distribution and oxide content of igneous rock types globally. The study included eight classifiers: Logistic Regression (LR), Gradient Boosting (GB), Random Forest (RF), K-nearest Neighbors (KNN), Support Vector Machine (SVM), Artificial Neural Network (ANN), and two ensemble-based classifier models, EN-1 and EN-2. EN-1 consisted of LR, GB, and RF aggregates, whereas EN-2 comprised the predictions of all ML models used in our study. The accuracy of EN-2 was 99.2 %, EN-1 achieved 98 %, while ANN yielded 98.2 %. EN-2 provided the best performance with highest initial curve for longest time on the receiver operating characteristic (ROC) curve. Based on the ranking features, SiO₂ was deemed most important followed by K₂O and Na₂O. Our findings indicate that the use of ensemble models enhances the accuracy and reliability of predictions by effectively capturing diverse patterns and correlations within the data. Consequently, this leads to more precise results in rock typing globally.

Keywords: Data mining; Ensemble-based method; Global distribution; Machine learning; Volcanic rocks.