Dialogue between algorithms and soil: Machine learning unravels the mystery of phthalates pollution in soil

J Hazard Mater. 2024 Nov 20:482:136604. doi: 10.1016/j.jhazmat.2024.136604. Online ahead of print.

Abstract

Soil is a major environmental sink for the emerging organic pollutants phthalates (PAEs), and the determination of key factors influencing PAEs accumulation in soil is crucial for agricultural sustainability and food security. Aiming at the time-consuming and inefficient characteristics of traditional batch experiments and statistical prediction models in comprehensively capturing PAEs dynamics in soil, an intelligent analysis framework based on machine learning was proposed and developed. In this study, thirty features were incorporated, including soil PAEs-concentrations, pollutant emissions, agricultural inputs, soil physicochemical properties, and climatic parameters. Six data-driven machine learning models were established: Random Forest Regression (RFR), Gradient Boosting Regression Tree (GBRT), Extreme Gradient Boosting (XGBoost), Multilayer Perceptron (MLP), Support Vector Regression (SVR), and k-Nearest Neighbors (KNN). Results showed that the MLP model exhibited optimal performance in predicting soil PAEs concentrations (R²=0.8637), followed by SVR (R²=0.8132) and XGBoost (R²=0.8096). Through feature importance analysis, it was determined that hydrometeorological factors, soil moisture conditions, and nutritional characteristics were the key factors controlling PAEs spatial distribution. Furthermore, non-linear effect analysis elucidated significant synergistic interactions among these environmental covariates. The spatiotemporal prediction model revealed continuous declining trends in PAEs pollution levels in eastern coastal regions over the next 5-10 years, while accumulation tendencies were observed in inland provinces particularly in Guizhou. This study demonstrates the effectiveness and advantages of machine learning in predicting soil PAEs-pollution, providing a new perspective for pollutant risk assessment and management in the era of environmental big data.

Keywords: Algorithm; Machine learning; Multifactor interaction; Phthalate esters; Soil contamination.