Prediction model for major bleeding in anticoagulated patients with cancer-associated venous thromboembolism using machine learning and natural language processing

Clin Transl Oncol. 2024 Sep 14. doi: 10.1007/s12094-024-03586-2. Online ahead of print.

Abstract

Purpose: We developed a predictive model to assess the risk of major bleeding (MB) within 6 months of primary venous thromboembolism (VTE) in cancer patients receiving anticoagulant treatment. We also sought to describe the prevalence and incidence of VTE in cancer patients, and to describe clinical characteristics at baseline and bleeding events during follow-up in patients receiving anticoagulants.

Methods: This observational, retrospective, and multicenter study used natural language processing and machine learning (ML), to analyze unstructured clinical data from electronic health records from nine Spanish hospitals between 2014 and 2018. All adult cancer patients with VTE receiving anticoagulants were included. Both clinically- and ML-driven feature selection was performed to identify MB predictors. Logistic regression (LR), decision tree (DT), and random forest (RF) algorithms were used to train predictive models, which were validated in a hold-out dataset and compared to the previously developed CAT-BLEED score.

Results: Of the 2,893,108 cancer patients screened, in-hospital VTE prevalence was 5.8% and the annual incidence ranged from 2.7 to 3.9%. We identified 21,227 patients with active cancer and VTE receiving anticoagulants (53.9% men, median age of 70 years). MB events after VTE diagnosis occurred in 10.9% of patients within the first six months. MB predictors included: hemoglobin, metastasis, age, platelets, leukocytes, and serum creatinine. The LR, DT, and RF models had AUC-ROC (95% confidence interval) values of 0.60 (0.55, 0.65), 0.60 (0.55, 0.65), and 0.61 (0.56, 0.66), respectively. These models outperformed the CAT-BLEED score with values of 0.53 (0.48, 0.59).

Conclusions: Our study shows encouraging results in identifying anticoagulated patients with cancer-associated VTE who are at high risk of MB.

Keywords: Anticoagulants; Cancer; Machine learning; Major bleeding; Natural language processing; Venous thromboembolism.