Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP

Ke Wang; Jing Tian; Chu Zheng; Hong Yang; Jia Ren; Yanling Liu; Qinghua Han; Yanbo Zhang

doi:10.1016/j.compbiomed.2021.104813

Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP

Comput Biol Med. 2021 Oct:137:104813. doi: 10.1016/j.compbiomed.2021.104813. Epub 2021 Aug 28.

Authors

Ke Wang¹, Jing Tian², Chu Zheng³, Hong Yang³, Jia Ren⁴, Yanling Liu³, Qinghua Han², Yanbo Zhang⁵

Affiliations

¹ Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China.
² Department of Cardiology, The First Affiliated Hospital of Shanxi Medical University, Taiyuan, People's Republic of China.
³ Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China.
⁴ Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China.
⁵ Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China. Electronic address: sxmuzyb@126.com.

PMID: 34481185
DOI: 10.1016/j.compbiomed.2021.104813

Abstract

Background: This study sought to evaluate the performance of machine learning (ML) models and establish an explainable ML model with good prediction of 3-year all-cause mortality in patients with heart failure (HF) caused by coronary heart disease (CHD).

Methods: We established six ML models using follow-up data to predict 3-year all-cause mortality. Through comprehensive evaluation, the best performing model was used to predict and stratify patients. The log-rank test was used to assess the difference between Kaplan-Meier curves. The association between ML risk and 3-year all-cause mortality was also assessed using multivariable Cox regression. Finally, an explainable approach based on ML and the SHapley Additive exPlanations (SHAP) method was deployed to calculate 3-year all-cause mortality risk and to generate individual explanations of the model's decisions.

Results: The best performing extreme gradient boosting (XGBoost) model was selected to predict and stratify patients. Subjects with a higher ML score had a high hazard of suffering events (hazard ratio [HR]: 10.351; P < 0.001), and this relationship persisted with a multivariable analysis (adjusted HR: 5.343; P < 0.001). Age, N-terminal pro-B-type natriuretic peptide, occupation, New York Heart Association classification, and nitrate drug use were important factors for both genders.

Conclusions: The ML-based risk stratification tool was able to accurately assess and stratify the risk of 3-year all-cause mortality in patients with HF caused by CHD. ML combined with SHAP could provide an explicit explanation of individualized risk prediction and give physicians an intuitive understanding of the influence of key features in the model.

Keywords: Heart failure; Interpretable model; Machine learning; SHAP value.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Coronary Disease* / complications
Female
Heart Failure*
Humans
Machine Learning
Male