Background: Coronary heart disease (CHD) and cerebral ischemic stroke (CIS) are two major types of cardiovascular disease (CVD) that are increasingly exerting pressure on the healthcare system worldwide. Machine learning holds great promise for improving the accuracy of disease prediction and risk stratification in CVD. However, there is currently no clinically applicable risk stratification model for the Asian population. This study developed a machine learning-based CHD and CIS model to address this issue.
Methods: A case-control study was conducted based on 8,624 electronic medical records from 2008 to 2019 at the Tongji Hospital in Wuhan, China. Two machine learning methods (the random down-sampling method and the random forest method) were integrated into 2 ensemble models (the CHD model and the CIS model). The trained models were then interpreted using Shapley Additive exPlanations (SHAP).
Results: The CHD and CIS models achieved good performance with the areas under the receiver operating characteristic curve (AUC) of 0.895 and 0.884 in random testing, and 0.905 and 0.889 in sequential testing, respectively. We identified 4 common factors between CHD and CIS: age, brachial-ankle pulse wave velocity, hypertension, and low-density lipoprotein cholesterol (LDL-C). Moreover, carcinoembryonic antigen (CEA) was identified as an independent indicator for CHD.
Conclusions: Our ensemble models can provide risk stratification for CHD and CIS with clinically applicable performance. By interpreting the trained models, we provided insights into the common and unique indicators in CHD and CIS. These findings may contribute to a better understanding and management of risk factors associated with CVD.
Keywords: Coronary heart disease (CHD); ischemic stroke; machine learning; risk stratification.
2022 Annals of Translational Medicine. All rights reserved.