With the advent of massive survival data with a cure fraction, large-scale regression for analyzing the effects of risk factors on a general population has become an emerging challenge. This article proposes a new probability-weighted method for estimation and inference for semiparametric cure regression models. We develop a flexible formulation of the mixture cure model consisting of the model-free incidence and the latency assumed by the semiparametric proportional hazards model. The susceptible probability assesses the concordance between the observations and the latency. With the susceptible probability as weight, we propose a weighted estimating equation method in a small-scale setting. Robust nonparametric estimation of the weight permits stable implementation of the estimation of regression parameters. A recursive probability-weighted estimation method based on data blocks with smaller sizes is further proposed, which achieves computational and memory efficiency in a large-scale or online setting. Asymptotic properties of the proposed estimators are established. We conduct simulation studies and a real data application to demonstrate the empirical performance of the proposed method.
Keywords: asymptotic normality; large-scale data; mixture cure model; proportional hazards model; streaming data; weighted estimating equation.
© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society.