Pollutants-mediated viral hepatitis in different types: assessment of different algorithms and time series models

Sci Rep. 2024 Sep 10;14(1):21141. doi: 10.1038/s41598-024-72047-1.

Abstract

The escalating frequency of environmental pollution incidents has raised significant concerns regarding the potential health impacts of pollutant fluctuations. Consequently, a comprehensive study on the role of pollutants in the prevalence of viral hepatitis is indispensable for the advancement of innovative prevention strategies. Monthly incidence rates of viral hepatitis from 2005 to 2020 were sourced from the Chinese Center for Disease Control and Prevention Infectious Disease Surveillance Information System. Pollution data spanning 2014-2020 were obtained from the National Oceanic and Atmospheric Administration (NOAA), encompassing pollutants such as CO, NO2, and O3. Time series analysis models, including seasonal auto-regressive integrated moving average (SARIMA), Holt-Winters model, and Generalized Additive Model (GAM), were employed to explore prediction and synergistic effects related to viral hepatitis. Spearman correlation analysis was utilized to identify pollutants suitable for inclusion in these models. Concurrently, machine learning (ML) algorithms were leveraged to refine the prediction of environmental pollutant levels. Finally, a weighted quantile sum (WQS) regression framework was developed to evaluate the singular and combined impacts of pollutants on viral hepatitis cases across different demographics, age groups, and environmental strata. The incidence of viral hepatitis in Beijing exhibited a declining trend, primarily characterized by HBV and HCV types. In predicting hepatitis prevalence trends, the Holt-Winters additive seasonal model outperformed the SARIMA multiplicative model ((1,1,0) (2,1,0) [12]). In the prediction of environmental pollutants, the SVM model demonstrated superior performance over the GPR model, particularly with Polynomial and Besseldot kernel functions. The combined pollutant risk effect on viral hepatitis was quantified as βWQS (95% CI) = 0.066 (0.018, 0.114). Among different groups, PM2.5 emerged as the most sensitive risk factor, notably impacting patients with HCV and HEV, as well as individuals aged 35-64. CO predominantly affected HAV patients, showing a risk effect of βWQS (95% CI) = - 0.0355 (- 0.0695, - 0.0016). Lower levels of PM2.5 and PM10 were associated with heightened risk of viral hepatitis incidence with a lag of five months, whereas elevated levels of PM2.5 (100-120 μg/m3) and CO correlated with increased hepatitis incidence risk with a lag of six months. The Holt-Winters model outperformed the SARIMA model in predicting the incidence of viral hepatitis. Among machine learning algorithms, SVM and GPR models demonstrated superior performance for analyzing pollutant data. Patients infected with HAV and HEV were primarily influenced by PM10 and CO, whereas SO2 and PM2.5 significantly impacted others. Individuals aged 35-64 years appeared particularly susceptible to these pollutants. Mixed pollutant exposures were found to affect the development of viral hepatitis with a notable lag of 5-6 months. These findings underscore the importance of long-term monitoring of pollutants in relation to viral hepatitis incidence.

Keywords: Machine learning; Pollutants; Time series; Viral hepatitis; Weighted quantile sum.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Air Pollutants / adverse effects
  • Air Pollutants / analysis
  • Algorithms*
  • Child
  • Child, Preschool
  • China / epidemiology
  • Environmental Pollutants / adverse effects
  • Environmental Pollution / adverse effects
  • Female
  • Hepatitis, Viral, Human / epidemiology
  • Humans
  • Incidence
  • Infant
  • Machine Learning
  • Male
  • Middle Aged
  • Prevalence
  • Seasons
  • Young Adult

Substances

  • Environmental Pollutants
  • Air Pollutants