Background: Screening entire populations for diabetes is not cost-effective. Hence, an efficient screening process must select those people who are at high risk for diabetes. In this study, we investigated whether screening procedures could be improved using an extended predictive feature search.
Materials and methods: In order to develop our model and identify persons with diabetes (prevalence) we used data from years of the National Health and Nutrition Examination Survey (2005-2010), which has not been explored for this purpose before. We calculated all combinations of predictors in order to identify the optimal subset, and we used a linear logistic classification model to predict diabetes. V-fold cross-validation was used for the process of including variables and for validating the final models. This new model was compared with two established models.
Results: In total, 5,398 participants were included in this study. Among these, 478 participants had unidentified diabetes. The established models had a receiver operating characteristics curve for the area under the curve (AUC) of 0.74 and 0.71 compared with an AUC of 0.78 for the new model, showing a significant difference (P<0.05). A proposed cutoff point for the established models yielded respective sensitivities/specificities of 63%/72% and 40%/72% compared with the new model, which had a sensitivity/specificity of 70%/72%.
Conclusions: Our data indicate that simple healthcare and economic information such as ratio of family income to poverty can add value in deciding who is at risk of unknown diabetes by using extended investigations of predictor combinations.