Identifying and ranking potential driver genes of Alzheimer's disease using multiview evidence aggregation

Sumit Mukherjee; Thanneer M Perumal; Kenneth Daily; Solveig K Sieberts; Larsson Omberg; Christoph Preuss; Gregory W Carter; Lara M Mangravite; Benjamin A Logsdon

doi:10.1093/bioinformatics/btz365

Identifying and ranking potential driver genes of Alzheimer's disease using multiview evidence aggregation

Bioinformatics. 2019 Jul 15;35(14):i568-i576. doi: 10.1093/bioinformatics/btz365.

Authors

Sumit Mukherjee¹, Thanneer M Perumal¹, Kenneth Daily¹, Solveig K Sieberts¹, Larsson Omberg¹, Christoph Preuss², Gregory W Carter², Lara M Mangravite¹, Benjamin A Logsdon¹

Affiliations

¹ Sage Bionetworks, Seattle, WA, USA.
² The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, USA.

Abstract

Motivation: Late onset Alzheimer's disease is currently a disease with no known effective treatment options. To better understand disease, new multi-omic data-sets have recently been generated with the goal of identifying molecular causes of disease. However, most analytic studies using these datasets focus on uni-modal analysis of the data. Here, we propose a data driven approach to integrate multiple data types and analytic outcomes to aggregate evidences to support the hypothesis that a gene is a genetic driver of the disease. The main algorithmic contributions of our article are: (i) a general machine learning framework to learn the key characteristics of a few known driver genes from multiple feature sets and identifying other potential driver genes which have similar feature representations, and (ii) A flexible ranking scheme with the ability to integrate external validation in the form of Genome Wide Association Study summary statistics. While we currently focus on demonstrating the effectiveness of the approach using different analytic outcomes from RNA-Seq studies, this method is easily generalizable to other data modalities and analysis types.

Results: We demonstrate the utility of our machine learning algorithm on two benchmark multiview datasets by significantly outperforming the baseline approaches in predicting missing labels. We then use the algorithm to predict and rank potential drivers of Alzheimer's. We show that our ranked genes show a significant enrichment for single nucleotide polymorphisms associated with Alzheimer's and are enriched in pathways that have been previously associated with the disease.

Availability and implementation: Source code and link to all feature sets is available at https://github.com/Sage-Bionetworks/EvidenceAggregatedDriverRanking.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Alzheimer Disease* / genetics
Genome-Wide Association Study*
Humans
Machine Learning
Software

Abstract

Publication types

MeSH terms

Grants and funding