A hybrid approach for modeling bicycle crash frequencies: Integrating random forest based SHAP model with random parameter negative binomial regression model

Accid Anal Prev. 2024 Sep 16:208:107778. doi: 10.1016/j.aap.2024.107778. Online ahead of print.

Abstract

To effectively capture and explain complex, nonlinear relationships within bicycle crash frequency data and account for unobserved heterogeneity simultaneously, this study proposes a new hybrid framework that combines the Random Forest-based SHapley Additive exPlanations (RF-SHAP) method with a random parameter negative binomial regression model (RPNB). First, four machine learning algorithms, including random forest (RF), support vector machine (SVM), gradient boosting machine (GBM), and Extreme Gradient Boosting (XGBoost), were compared for variable importance calculation. The RF algorithm, demonstrating the best performance, was selected and integrated into an interpretable machine learning-based method (i.e., RF-SHAP) to provide an interpretable measure of each variable's impact, which is critical for understanding the model's predictions results. Finally, the RF-SHAP method was combined with the RPNB model to explore individual-specific variations that influence crash frequency predictions. Using 288 traffic analysis zones (TAZs) in Greater London and various regional risk factors for bicycle crash frequency, the proposed framework was validated. The results indicate that the proposed framework demonstrates improved prediction accuracy and better factor interpretation in analyzing bicycle crash frequency. The model exhibits consistent Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values, indicating its reliable explanatory power. Furthermore, there is a significant improvement in the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). This suggests that the proposed model effectively combines the explanatory power of statistical models with the forecasting powers of data-driven models. The interpretability of SHAP values, coupled with the causal insights from RPNB, provides policymakers with actionable information to develop targeted interventions.

Keywords: Bicycle frequency; Hybrid approach; Random Forest based SHAP; Random parameter negative binomial regression model; Regional factors; Unobserved effects.