Introduction: This prospective study aimed to develop a robust and clinically applicable method to identify patients with high-risk early-stage lung cancer and then to validate this method for use in future translational studies.
Methods: Three published Affymetrix microarray data sets representing 680 primary tumors were used in the survival-related gene selection procedure using clustering, Cox model, and random survival forest analysis. A final set of 91 genes was selected and tested as a predictor of survival using a quantitative real-time polymerase chain reaction-based assay using an independent cohort of 101 lung adenocarcinomas.
Results: The random survival forest model built from 91 genes in the training set predicted patient survival in an independent cohort of 101 lung adenocarcinomas, with a prediction error rate of 26.6%. The mortality risk index was significantly related to survival (Cox model p < 0.00001) and separated all patients into low-, medium-, and high-risk groups (hazard ratio = 1.00, 2.82, 4.42). The mortality risk index was also related to survival in stage 1 patients (Cox model p = 0.001), separating patients into low-, medium-, and high-risk groups (hazard ratio = 1.00, 3.29, 3.77).
Conclusions: The development and validation of this robust quantitative real-time polymerase chain reaction platform allows prediction of patient survival with early-stage lung cancer. Utilization will now allow investigators to evaluate it prospectively by incorporation into new clinical trials with the goal of personalized treatment of patients with lung cancer and improving patient survival.