Predictive Accuracy of Stroke Risk Prediction Models Across Black and White Race, Sex, and Age Groups

JAMA. 2023 Jan 24;329(4):306-317. doi: 10.1001/jama.2022.24683.

Abstract

Importance: Stroke is the fifth-highest cause of death in the US and a leading cause of serious long-term disability with particularly high risk in Black individuals. Quality risk prediction algorithms, free of bias, are key for comprehensive prevention strategies.

Objective: To compare the performance of stroke-specific algorithms with pooled cohort equations developed for atherosclerotic cardiovascular disease for the prediction of new-onset stroke across different subgroups (race, sex, and age) and to determine the added value of novel machine learning techniques.

Design, setting, and participants: Retrospective cohort study on combined and harmonized data from Black and White participants of the Framingham Offspring, Atherosclerosis Risk in Communities (ARIC), Multi-Ethnic Study for Atherosclerosis (MESA), and Reasons for Geographical and Racial Differences in Stroke (REGARDS) studies (1983-2019) conducted in the US. The 62 482 participants included at baseline were at least 45 years of age and free of stroke or transient ischemic attack.

Exposures: Published stroke-specific algorithms from Framingham and REGARDS (based on self-reported risk factors) as well as pooled cohort equations for atherosclerotic cardiovascular disease plus 2 newly developed machine learning algorithms.

Main outcomes and measures: Models were designed to estimate the 10-year risk of new-onset stroke (ischemic or hemorrhagic). Discrimination concordance index (C index) and calibration ratios of expected vs observed event rates were assessed at 10 years. Analyses were conducted by race, sex, and age groups.

Results: The combined study sample included 62 482 participants (median age, 61 years, 54% women, and 29% Black individuals). Discrimination C indexes were not significantly different for the 2 stroke-specific models (Framingham stroke, 0.72; 95% CI, 0.72-073; REGARDS self-report, 0.73; 95% CI, 0.72-0.74) vs the pooled cohort equations (0.72; 95% CI, 0.71-0.73): differences 0.01 or less (P values >.05) in the combined sample. Significant differences in discrimination were observed by race: the C indexes were 0.76 for all 3 models in White vs 0.69 in Black women (all P values <.001) and between 0.71 and 0.72 in White men and between 0.64 and 0.66 in Black men (all P values ≤.001). When stratified by age, model discrimination was better for younger (<60 years) vs older (≥60 years) adults for both Black and White individuals. The ratios of observed to expected 10-year stroke rates were closest to 1 for the REGARDS self-report model (1.05; 95% CI, 1.00-1.09) and indicated risk overestimation for Framingham stroke (0.86; 95% CI, 0.82-0.89) and pooled cohort equations (0.74; 95% CI, 0.71-0.77). Performance did not significantly improve when novel machine learning algorithms were applied.

Conclusions and relevance: In this analysis of Black and White individuals without stroke or transient ischemic attack among 4 US cohorts, existing stroke-specific risk prediction models and novel machine learning techniques did not significantly improve discriminative accuracy for new-onset stroke compared with the pooled cohort equations, and the REGARDS self-report model had the best calibration. All algorithms exhibited worse discrimination in Black individuals than in White individuals, indicating the need to expand the pool of risk factors and improve modeling techniques to address observed racial disparities and improve model performance.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Age Factors
  • Atherosclerosis / epidemiology
  • Bias
  • Black People* / statistics & numerical data
  • Black or African American
  • Cardiovascular Diseases / epidemiology
  • Computer Simulation / standards
  • Computer Simulation / statistics & numerical data
  • Female
  • Healthcare Disparities* / ethnology
  • Healthcare Disparities* / standards
  • Healthcare Disparities* / statistics & numerical data
  • Humans
  • Ischemic Attack, Transient / epidemiology
  • Machine Learning / standards
  • Male
  • Middle Aged
  • Prejudice* / prevention & control
  • Race Factors / statistics & numerical data
  • Reproducibility of Results
  • Retrospective Studies
  • Risk Assessment* / standards
  • Sex Factors
  • Stroke* / diagnosis
  • Stroke* / epidemiology
  • Stroke* / ethnology
  • United States / epidemiology
  • White People* / statistics & numerical data