Mergeomics: multidimensional data integration to identify pathogenic perturbations to biological systems

Le Shu; Yuqi Zhao; Zeyneb Kurt; Sean Geoffrey Byars; Taru Tukiainen; Johannes Kettunen; Luz D Orozco; Matteo Pellegrini; Aldons J Lusis; Samuli Ripatti; Bin Zhang; Michael Inouye; Ville-Petteri Mäkinen; Xia Yang

doi:10.1186/s12864-016-3198-9

Mergeomics: multidimensional data integration to identify pathogenic perturbations to biological systems

BMC Genomics. 2016 Nov 4;17(1):874. doi: 10.1186/s12864-016-3198-9.

Authors

Le Shu¹, Yuqi Zhao¹, Zeyneb Kurt¹, Sean Geoffrey Byars^{2

3}, Taru Tukiainen⁴, Johannes Kettunen⁴, Luz D Orozco⁵, Matteo Pellegrini⁵, Aldons J Lusis⁶, Samuli Ripatti⁴, Bin Zhang⁷, Michael Inouye^{2

3

8}, Ville-Petteri Mäkinen^{9

10

11

12}, Xia Yang^{13

14}

Affiliations

¹ Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, USA.
² Center for Systems Genomics, University of Melbourne, Melbourne, Australia.
³ School of BioSciences, University of Melbourne, Melbourne, Australia.
⁴ Institute for Molecular Medicine, Helsinki, Finland.
⁵ Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, CA, USA.
⁶ Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
⁷ Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁸ Department of Pathology, University of Melbourne, Melbourne, Australia.
⁹ Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, USA. ville-petteri.makinen@sahmri.com.
¹⁰ South Australian Health and Medical Research Institute, Adelaide, Australia. ville-petteri.makinen@sahmri.com.
¹¹ School of Biological Sciences, University of Adelaide, Adelaide, Australia. ville-petteri.makinen@sahmri.com.
¹² Computational Medicine, Faculty of Medicine, University of Oulu and Biocenter Oulu, Oulu, Finland. ville-petteri.makinen@sahmri.com.
¹³ Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, USA. xyang123@ucla.edu.
¹⁴ Insitute for Quantitative and Computational Biosciences, University of California, Los Angeles, Los Angeles, CA, USA. xyang123@ucla.edu.

Abstract

Background: Complex diseases are characterized by multiple subtle perturbations to biological processes. New omics platforms can detect these perturbations, but translating the diverse molecular and statistical information into testable mechanistic hypotheses is challenging. Therefore, we set out to create a public tool that integrates these data across multiple datasets, platforms, study designs and species in order to detect the most promising targets for further mechanistic studies.

Results: We developed Mergeomics, a computational pipeline consisting of independent modules that 1) leverage multi-omics association data to identify biological processes that are perturbed in disease, and 2) overlay the disease-associated processes onto molecular interaction networks to pinpoint hubs as potential key regulators. Unlike existing tools that are mostly dedicated to specific data type or settings, the Mergeomics pipeline accepts and integrates datasets across platforms, data types and species. We optimized and evaluated the performance of Mergeomics using simulation and multiple independent datasets, and benchmarked the results against alternative methods. We also demonstrate the versatility of Mergeomics in two case studies that include genome-wide, epigenome-wide and transcriptome-wide datasets from human and mouse studies of total cholesterol and fasting glucose. In both cases, the Mergeomics pipeline provided statistical and contextual evidence to prioritize further investigations in the wet lab. The software implementation of Mergeomics is freely available as a Bioconductor R package.

Conclusion: Mergeomics is a flexible and robust computational pipeline for multidimensional data integration. It outperforms existing tools, and is easily applicable to datasets from different studies, species and omics data types for the study of complex traits.

Keywords: Blood glucose; Cholesterol; Functional genomics; Gene networks; Integrative genomics; Key drivers; Mergeomics; Multidimensional data integration.

Publication types

Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Animals
Biomarkers
Computational Biology / methods*
Databases, Genetic
Disease Susceptibility*
Genome-Wide Association Study
Glucose / metabolism
Humans
Polymorphism, Single Nucleotide
Reproducibility of Results
Software*
Web Browser

Substances

Biomarkers
Glucose

Grants and funding

R01 DK104363/DK/NIDDK NIH HHS/United States