A Knowledge Distillation Ensemble Framework for Predicting Short- and Long-Term Hospitalization Outcomes From Electronic Health Records Data

IEEE J Biomed Health Inform. 2022 Jan;26(1):423-435. doi: 10.1109/JBHI.2021.3089287. Epub 2022 Jan 17.

Abstract

The ability to perform accurate prognosis is crucial for proactive clinical decision making, informed resource management and personalised care. Existing outcome prediction models suffer from a low recall of infrequent positive outcomes. We present a highly-scalable and robust machine learning framework to automatically predict adversity represented by mortality and ICU admission and readmission from time-series of vital signs and laboratory results obtained within the first 24 hours of hospital admission. The stacked ensemble platform comprises two components: a) an unsupervised LSTM Autoencoder that learns an optimal representation of the time-series, using it to differentiate the less frequent patterns which conclude with an adverse event from the majority patterns that do not, and b) a gradient boosting model, which relies on the constructed representation to refine prediction by incorporating static features. The model is used to assess a patient's risk of adversity and provides visual justifications of its prediction. Results of three case studies show that the model outperforms existing platforms in ICU and general ward settings, achieving average Precision-Recall Areas Under the Curve (PR-AUCs) of 0.891 (95% CI: 0.878-0.939) for mortality and 0.908 (95% CI: 0.870-0.935) in predicting ICU admission and readmission.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Electronic Health Records*
  • Hospitalization
  • Humans
  • Length of Stay
  • Machine Learning*
  • ROC Curve
  • Retrospective Studies