Background: This study aimed to identify predictors associated with tooth loss in a large periodontitis patient cohort in the university setting using the machine learning approach.
Methods: Information on periodontitis patients and 18 factors identified at the initial visit was extracted from electronic health records. A two-step machine learning pipeline was proposed to develop the tooth loss prediction model. The primary outcome is tooth loss count. The prediction model was built on significant factors (single or combination) selected by the RuleFit algorithm, and these factors were further adopted by the count regression model. Model performance was evaluated by root-mean-squared error (RMSE). Associations between predictors and tooth loss were also assessed by a classical statistical approach to validate the performance of the machine learning model.
Results: In total, 7840 patients were included. The machine learning model predicting tooth loss count achieved RMSE of 2.71. Age, smoking, frequency of brushing, frequency of flossing, periodontal diagnosis, bleeding on probing percentage, number of missing teeth at baseline, and tooth mobility were associated with tooth loss in both machine learning and classical statistical models.
Conclusion: The two-step machine learning pipeline is feasible to predict tooth loss in periodontitis patients. Compared to classical statistical methods, this rule-based machine learning approach improves model explainability. However, the model's generalizability needs to be further validated by external datasets.
Keywords: electronic health records; periodontal diseases; regression analysis; risk factors; supervised machine learning; tooth loss.
© 2023 American Academy of Periodontology.