The effect of data aggregation on dispersion estimates in count data models

Adam Errington; Jochen Einbeck; Jonathan Cumming; Ute Rössler; David Endesfelder

doi:10.1515/ijb-2020-0079

The effect of data aggregation on dispersion estimates in count data models

Int J Biostat. 2021 May 7;18(1):183-202. doi: 10.1515/ijb-2020-0079.

Authors

Adam Errington¹, Jochen Einbeck¹, Jonathan Cumming¹, Ute Rössler², David Endesfelder²

Affiliations

¹ Department of Mathematical Sciences, Durham University, Durham, UK.
² Bundesamt für Strahlenschutz (BfS), Oberschleissheim, Germany.

PMID: 33962495
DOI: 10.1515/ijb-2020-0079

Abstract

For the modelling of count data, aggregation of the raw data over certain subgroups or predictor configurations is common practice. This is, for instance, the case for count data biomarkers of radiation exposure. Under the Poisson law, count data can be aggregated without loss of information on the Poisson parameter, which remains true if the Poisson assumption is relaxed towards quasi-Poisson. However, in biodosimetry in particular, but also beyond, the question of how the dispersion estimates for quasi-Poisson models behave under data aggregation have received little attention. Indeed, for real data sets featuring unexplained heterogeneities, dispersion estimates can increase strongly after aggregation, an effect which we will demonstrate and quantify explicitly for some scenarios. The increase in dispersion estimates implies an inflation of the parameter standard errors, which, however, by comparison with random effect models, can be shown to serve a corrective purpose. The phenomena are illustrated by γ-H2AX foci data as used for instance in radiation biodosimetry for the calibration of dose-response curves.

Keywords: heterogeneity; overdispersion; quasi-Poisson; radiation biomarker; random effect.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Data Aggregation*
Poisson Distribution