GenomeForest: An Ensemble Machine Learning Classifier for Endometriosis

AMIA Jt Summits Transl Sci Proc. 2020 May 30:2020:33-42. eCollection 2020.

Abstract

Endometriosis is a complex and high impact disease affecting 176 million women worldwide with diagnostic latency between 4 to 11 years due to lack of a definitive clinical symptom or a minimally invasive diagnostic method. In this study, we developed a new ensemble machine learning classifier based on chromosomal partitioning, named GenomeForest and applied it in classifying the endometriosis vs. the control patients using 38 RNA-seq and 80 enrichment-based DNA-methylation (MBD-seq) datasets, and computed performance assessment with six different experiments. The ensemble machine learning models provided an avenue for identifying several candidate biomarker genes with a very high F1 score; a near perfect F1 score (0.968) for the transcriptomics dataset and a very high F1 score (0.918) for the methylomics dataset. We hope in the future a less invasive biopsy can be used to diagnose endometriosis using the findings from such ensemble machine learning classifiers, as demonstrated in this study.