Background: Unprecedented public health measures have been used during this coronavirus 2019 (COVID-19) pandemic to control the spread of SARS-CoV-2 virus. It is a challenge to implement timely and appropriate public health interventions.
Methods and findings: Population and COVID-19 epidemiological data between 21st January 2020 to 15th November 2020 from 216 countries and territories were included with the implemented public health interventions. We used deep reinforcement learning, and the algorithm was trained to enable agents to try to find optimal public health strategies that maximized total reward on controlling the spread of COVID-19. The results suggested by the algorithm were analyzed against the actual timing and intensity of lockdown and travel restrictions. Early implementations of the actual lockdown and travel restriction policies, usually at the time of local index case were associated with less burden of COVID-19. In contrast, our agent suggested to initiate at least minimal intensity of lockdown or travel restriction even before or on the day of the index case in each country and territory. In addition, the agent mostly recommended a combination of lockdown and travel restrictions and higher intensity policies than the policies implemented by governments, but did not always encourage rapid full lockdown and full border closures. The limitation of this study was that it was done with incomplete data due to the emerging COVID-19 epidemic, inconsistent testing and reporting. In addition, our research focuses only on population health benefits by controlling the spread of COVID-19 without balancing the negative impacts of economic and social consequences.
Interpretation: Compared to actual government implementation, our algorithm mostly recommended earlier intensity of lockdown and travel restrictions. Reinforcement learning may be used as a decision support tool for implementation of public health interventions during COVID-19 and future pandemics.