Machine learning improves risk stratification of coronary heart disease and stroke

Bangwei Chen; Lei Ruan; Liuqiao Yang; Yucong Zhang; Yueqi Lu; Yu Sang; Xin Jin; Yong Bai; Cuntai Zhang; Tao Li

doi:10.21037/atm-22-1916

Machine learning improves risk stratification of coronary heart disease and stroke

Ann Transl Med. 2022 Nov;10(21):1156. doi: 10.21037/atm-22-1916.

Authors

Bangwei Chen^#^{1

2

3}, Lei Ruan^#⁴, Liuqiao Yang^#^{2

3

5}, Yucong Zhang⁴, Yueqi Lu^{2

3}, Yu Sang⁴, Xin Jin^{2

3}, Yong Bai^{2

3}, Cuntai Zhang⁴, Tao Li^{2

3}

Affiliations

¹ School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.
² BGI-Shenzhen, Shenzhen, China.
³ China National GeneBank, Shenzhen, China.
⁴ Department of Geriatrics, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
⁵ College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China.

^# Contributed equally.

Abstract

Background: Coronary heart disease (CHD) and cerebral ischemic stroke (CIS) are two major types of cardiovascular disease (CVD) that are increasingly exerting pressure on the healthcare system worldwide. Machine learning holds great promise for improving the accuracy of disease prediction and risk stratification in CVD. However, there is currently no clinically applicable risk stratification model for the Asian population. This study developed a machine learning-based CHD and CIS model to address this issue.

Methods: A case-control study was conducted based on 8,624 electronic medical records from 2008 to 2019 at the Tongji Hospital in Wuhan, China. Two machine learning methods (the random down-sampling method and the random forest method) were integrated into 2 ensemble models (the CHD model and the CIS model). The trained models were then interpreted using Shapley Additive exPlanations (SHAP).

Results: The CHD and CIS models achieved good performance with the areas under the receiver operating characteristic curve (AUC) of 0.895 and 0.884 in random testing, and 0.905 and 0.889 in sequential testing, respectively. We identified 4 common factors between CHD and CIS: age, brachial-ankle pulse wave velocity, hypertension, and low-density lipoprotein cholesterol (LDL-C). Moreover, carcinoembryonic antigen (CEA) was identified as an independent indicator for CHD.

Conclusions: Our ensemble models can provide risk stratification for CHD and CIS with clinically applicable performance. By interpreting the trained models, we provided insights into the common and unique indicators in CHD and CIS. These findings may contribute to a better understanding and management of risk factors associated with CVD.

Keywords: Coronary heart disease (CHD); ischemic stroke; machine learning; risk stratification.