Objective: COVID-19 would kill fewer people if health programmes can predict who is at higher risk of mortality because resources can be targeted to protect those people from infection. We predict mortality in a very large population in Mexico with machine learning using demographic variables and pre-existing conditions.
Design: Cohort study.
Setting: March 2020 to November 2021 in Mexico, nationally represented.
Participants: 1.4 million laboratory-confirmed patients with COVID-19 in Mexico at or over 20 years of age.
Primary and secondary outcome measures: Analysis is performed on data from March 2020 to November 2021 and over three phases: (1) from March to October in 2020, (2) from November 2020 to March 2021 and (3) from April to November 2021. We predict mortality using an ensemble machine learning method, super learner, and independently estimate the adjusted mortality relative risk of each pre-existing condition using targeted maximum likelihood estimation.
Results: Super learner fit has a high predictive performance (C-statistic: 0.907), where age is the most predictive factor for mortality. After adjusting for demographic factors, renal disease, hypertension, diabetes and obesity are the most impactful pre-existing conditions. Phase analysis shows that the adjusted mortality risk decreased over time while relative risk increased for each pre-existing condition.
Conclusions: While age is the most important predictor of mortality, younger individuals with hypertension, diabetes and obesity are at comparable mortality risk as individuals who are 20 years older without any of the three conditions. Our model can be continuously updated to identify individuals who should most be protected against infection as the pandemic evolves.
Keywords: COVID-19; epidemiology; general diabetes; hypertension; risk factors; statistics & research methods.
© Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY. Published by BMJ.