Development of machine learning models predicting mortality using routinely collected observational health data from 0-59 months old children admitted to an intensive care unit in Bangladesh: critical role of biochemistry and haematology data

BMJ Paediatr Open. 2024 Jul 22;8(1):e002365. doi: 10.1136/bmjpo-2023-002365.

Abstract

Introduction: Treatment in the intensive care unit (ICU) generates complex data where machine learning (ML) modelling could be beneficial. Using routine hospital data, we evaluated the ability of multiple ML models to predict inpatient mortality in a paediatric population in a low/middle-income country.

Method: We retrospectively analysed hospital record data from 0-59 months old children admitted to the ICU of Dhaka hospital of International Centre for Diarrhoeal Disease Research, Bangladesh. Five commonly used ML models- logistic regression, least absolute shrinkage and selection operator, elastic net, gradient boosting trees (GBT) and random forest (RF), were evaluated using the area under the receiver operating characteristic curve (AUROC). Top predictors were selected using RF mean decrease Gini scores as the feature importance values.

Results: Data from 5669 children was used and was reduced to 3505 patients (10% death, 90% survived) following missing data removal. The mean patient age was 10.8 months (SD=10.5). The top performing models based on the validation performance measured by mean 10-fold cross-validation AUROC on the training data set were RF and GBT. Hyperparameters were selected using cross-validation and then tested in an unseen test set. The models developed used demographic, anthropometric, clinical, biochemistry and haematological data for mortality prediction. We found RF consistently outperformed GBT and predicted the mortality with AUROC of ≥0.87 in the test set when three or more laboratory measurements were included. However, after the inclusion of a fourth laboratory measurement, very minor predictive gains (AUROC 0.87 vs 0.88) resulted. The best predictors were the biochemistry and haematological measurements, with the top predictors being total CO2, potassium, creatinine and total calcium.

Conclusions: Mortality in children admitted to ICU can be predicted with high accuracy using RF ML models in a real-life data set using multiple laboratory measurements with the most important features primarily coming from patient biochemistry and haematology.

Keywords: Health services research; Statistics.

MeSH terms

  • Bangladesh / epidemiology
  • Child, Preschool
  • Female
  • Hospital Mortality
  • Humans
  • Infant
  • Infant, Newborn
  • Intensive Care Units / statistics & numerical data
  • Machine Learning*
  • Male
  • ROC Curve
  • Retrospective Studies