Multiple imputation strategies for a bounded outcome variable in a competing risks analysis

Stat Med. 2021 Apr 15;40(8):1917-1929. doi: 10.1002/sim.8879. Epub 2021 Jan 19.

Abstract

In patient follow-up studies, events of interest may take place between periodic clinical assessments and so the exact time of onset is not observed. Such events are known as "bounded" or "interval-censored." Methods for handling such events can be categorized as either (i) applying multiple imputation (MI) strategies or (ii) taking a full likelihood-based (LB) approach. We focused on MI strategies, rather than LB methods, because of their flexibility. We evaluated MI strategies for bounded event times in a competing risks analysis, examining the extent to which interval boundaries, features of the data distribution and substantive analysis model are accounted for in the imputation model. Candidate imputation models were predictive mean matching (PMM); log-normal regression with postimputation back-transformation; normal regression with and without restrictions on the imputed values and Delord and Genin's method based on sampling from the cumulative incidence function. We used a simulation study to compare MI methods and one LB method when data were missing at random and missing not at random, also varying the proportion of missing data, and then applied the methods to a hematopoietic stem cell transplantation dataset. We found that cumulative incidence and median event time estimation were sensitive to model misspecification. In a competing risks analysis, we found that it is more important to account for features of the data distribution than to restrict imputed values based on interval boundaries or to ensure compatibility with the substantive analysis by sampling from the cumulative incidence function. We recommend MI by type 1 PMM.

Keywords: bounded data; competing risks; missing data; multiple imputation; predictive mean matching.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Data Interpretation, Statistical
  • Humans
  • Likelihood Functions
  • Research Design*
  • Risk Assessment