MotifGenie: a Python application for searching transcription factor binding sequences using ChIP-Seq datasets

Bioinformatics. 2021 Nov 18;37(22):4238-4239. doi: 10.1093/bioinformatics/btab379.

Abstract

Motivation: Next generation sequencing enabled the fast accumulation of genomic data at public repositories. This technology also made it possible to better understand the regulation of gene expression by transcription factors (TFs) and various chromatin-associated proteins through the integration of chromatin immunoprecipitation (ChIP-Seq). The Cistrome Project has become one of the indispensable research portals for biologists to access and analyze data generated with thousands of ChIP-Seq experiments. Integrative motif analysis on shared binding regions among a set of experiments is not yet achievable despite a set of search and analysis tools provided by Cistrome via its web interface and the Galaxy framework.

Results: We implemented a python command-line tool for searching binding sequences of a TF common to multiple ChIP-Seq experiments. We use the peaks in the Cistrome database as identified by MACS 2.0 for each experiment and identify shared peak regions in a genomic locus of interest. We then scan these regions for binding sequences using a binding motif of a TF obtained from the JASPAR database. MotifGenie is developed in collaboration with molecular biologists and its findings are corroborated by laboratory experiments.

Availability and implementation: MotifGenie is freely available at https://github.com/ceragoguztuzun/MotifGenie.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites / genetics
  • Chromatin Immunoprecipitation
  • Chromatin Immunoprecipitation Sequencing*
  • Sequence Analysis, DNA
  • Transcription Factors* / metabolism

Substances

  • Transcription Factors