Statistical analysis of self-reported health conditions in cohort studies: handling of missing onset age

J Clin Epidemiol. 2024 Sep:173:111458. doi: 10.1016/j.jclinepi.2024.111458. Epub 2024 Jul 9.

Abstract

Objectives: This paper discusses methodological challenges in epidemiological association analysis of a time-to-event outcome and hypothesized risk factors, where age/time at the onset of the outcome may be missing in some cases, a condition commonly encountered when the outcome is self-reported.

Study design and setting: A cohort study with long-term follow-up for outcome ascertainment such as the Childhood Cancer Survivor Study (CCSS), a large cohort study of 5-year survivors of childhood cancer diagnosed in 1970-1999 in which occurrences and age at onset of various chronic health conditions (CHCs) are self-reported in surveys. Simple methods for handling missing onset age and their potential bias in the exposure-outcome association inference are discussed. The interval-censored method is discussed as a remedy for handling this problem. The finite sample performance of these approaches is compared through Monte Carlo simulations. Examples from the CCSS include four CHCs (diabetes, myocardial infarction, osteoporosis/osteopenia, and growth hormone deficiency).

Results: The interval-censored method is useable in practice using the standard statistical software. The simulation study showed that the regression coefficient estimates from the 'Interval censored' method consistently displayed reduced bias and, in most cases, smaller standard deviations, resulting in smaller mean square errors, compared to those from the simple approaches, regardless of the proportion of subjects with an event of interest, the proportion of missing onset age, and the sample size.

Conclusion: The interval-censored method is a statistically valid and practical approach to the association analysis of self-reported time-to-event data when onset age may be missing. While the simpler approaches that force such data into complete data may enable the standard analytic methods to be applicable, there is considerable loss in both accuracy and precision relative to the interval-censored method.

Keywords: Childhood cancer survivors; Late effects of cancer therapy; Missing data; Observational study; Patient-reported outcomes; Recall bias; Time-to-event regression.

MeSH terms

  • Adult
  • Age of Onset*
  • Bias
  • Cancer Survivors / statistics & numerical data
  • Child
  • Chronic Disease / epidemiology
  • Cohort Studies
  • Data Interpretation, Statistical
  • Female
  • Humans
  • Male
  • Neoplasms / epidemiology
  • Self Report*