A robust regression model for bounded count health data

Stat Methods Med Res. 2024 Aug;33(8):1392-1411. doi: 10.1177/09622802241259178. Epub 2024 Jun 7.

Abstract

Bounded count response data arise naturally in health applications. In general, the well-known beta-binomial regression model form the basis for analyzing this data, specially when we have overdispersed data. Little attention, however, has been given to the literature on the possibility of having extreme observations and overdispersed data. We propose in this work an extension of the beta-binomial regression model, named the beta-2-binomial regression model, which provides a rather flexible approach for fitting a regression model with a wide spectrum of bounded count response data sets under the presence of overdispersion, outliers, or excess of extreme observations. This distribution possesses more skewness and kurtosis than the beta-binomial model but preserves the same mean and variance form of the beta-binomial model. Additional properties of the beta-2-binomial distribution are derived including its behavior on the limits of its parametric space. A penalized maximum likelihood approach is considered to estimate parameters of this model and a residual analysis is included to assess departures from model assumptions as well as to detect outlier observations. Simulation studies, considering the robustness to outliers, are presented confirming that the beta-2-binomial regression model is a better robust alternative, in comparison with the binomial and beta-binomial regression models. We also found that the beta-2-binomial regression model outperformed the binomial and beta-binomial regression models in our applications of predicting liver cancer development in mice and the number of inappropriate days a patient spent in a hospital.

Keywords: Count data; beta-2-binomial; beta-binomial; generalized additive model for location; penalized maximum-likelihood estimation; regression models; scale and shape.

MeSH terms

  • Animals
  • Humans
  • Likelihood Functions
  • Liver Neoplasms
  • Models, Statistical*
  • Regression Analysis