Can machine-learning improve cardiovascular risk prediction using routine clinical data?

Stephen F Weng; Jenna Reps; Joe Kai; Jonathan M Garibaldi; Nadeem Qureshi

doi:10.1371/journal.pone.0174944

Can machine-learning improve cardiovascular risk prediction using routine clinical data?

PLoS One. 2017 Apr 4;12(4):e0174944. doi: 10.1371/journal.pone.0174944. eCollection 2017.

Authors

Stephen F Weng^{1

2}, Jenna Reps^{3

4}, Joe Kai^{1

2}, Jonathan M Garibaldi^{3

4}, Nadeem Qureshi^{1

2}

Affiliations

¹ NIHR School for Primary Care Research, University of Nottingham, Nottingham, United Kingdom.
² Division of Primary Care, School of Medicine, University of Nottingham, Nottingham, United Kingdom.
³ Advanced Data Analysis Centre, University of Nottingham, Nottingham, United Kingdom.
⁴ School of Computer Science, University of Nottingham, Nottingham, United Kingdom.

Abstract

Background: Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction.

Methods: Prospective cohort study using routine clinical data of 378,256 patients from UK family practices, free from cardiovascular disease at outset. Four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years. Predictive accuracy was assessed by area under the 'receiver operating curve' (AUC); and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) to predict 7.5% cardiovascular risk (threshold for initiating statins).

Findings: 24,970 incident cardiovascular events (6.6%) occurred. Compared to the established risk prediction algorithm (AUC 0.728, 95% CI 0.723-0.735), machine-learning algorithms improved prediction: random forest +1.7% (AUC 0.745, 95% CI 0.739-0.750), logistic regression +3.2% (AUC 0.760, 95% CI 0.755-0.766), gradient boosting +3.3% (AUC 0.761, 95% CI 0.755-0.766), neural networks +3.6% (AUC 0.764, 95% CI 0.759-0.769). The highest achieving (neural networks) algorithm predicted 4,998/7,404 cases (sensitivity 67.5%, PPV 18.4%) and 53,458/75,585 non-cases (specificity 70.7%, NPV 95.7%), correctly predicting 355 (+7.6%) more patients who developed cardiovascular disease compared to the established algorithm.

Conclusions: Machine-learning significantly improves accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others.

MeSH terms

Adult
Aged
Aged, 80 and over
Algorithms
Cardiovascular Diseases / etiology*
Cardiovascular Diseases / prevention & control*
Cohort Studies
Electronic Health Records / statistics & numerical data
Female
Humans
Logistic Models
Machine Learning*
Male
Middle Aged
Neural Networks, Computer
Prospective Studies
Risk Factors

Grants and funding

This paper presents independent research funded by the National Institute for Health Research School for Primary Care Research (NIHR SPCR): personal training fellowship award for SW from 2015-2018. URL: https://www.spcr.nihr.ac.uk/trainees. The views expressed are those of the authors and not necessarily those of the NIHR, the NHS, or the Department of Health.