Probing instructions for expression regulation in gene nucleotide compositions

PLoS Comput Biol. 2018 Jan 2;14(1):e1005921. doi: 10.1371/journal.pcbi.1005921. eCollection 2018 Jan.

Abstract

Gene expression is orchestrated by distinct regulatory regions to ensure a wide variety of cell types and functions. A challenge is to identify which regulatory regions are active, what are their associated features and how they work together in each cell type. Several approaches have tackled this problem by modeling gene expression based on epigenetic marks, with the ultimate goal of identifying driving regions and associated genomic variations that are clinically relevant in particular in precision medicine. However, these models rely on experimental data, which are limited to specific samples (even often to cell lines) and cannot be generated for all regulators and all patients. In addition, we show here that, although these approaches are accurate in predicting gene expression, inference of TF combinations from this type of models is not straightforward. Furthermore these methods are not designed to capture regulation instructions present at the sequence level, before the binding of regulators or the opening of the chromatin. Here, we probe sequence-level instructions for gene expression and develop a method to explain mRNA levels based solely on nucleotide features. Our method positions nucleotide composition as a critical component of gene expression. Moreover, our approach, able to rank regulatory regions according to their contribution, unveils a strong influence of the gene body sequence, in particular introns. We further provide evidence that the contribution of nucleotide content can be linked to co-regulations associated with genome 3D architecture and to associations of genes within topologically associated domains.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Base Composition*
  • Computational Biology
  • DNA Copy Number Variations
  • Enhancer Elements, Genetic
  • Gene Expression Regulation*
  • Genome, Human
  • Humans
  • Models, Genetic
  • Neoplasms / genetics
  • Neoplasms / metabolism
  • Polymorphism, Single Nucleotide
  • Promoter Regions, Genetic
  • Quantitative Trait Loci
  • RNA, Messenger / chemistry
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • Regulatory Sequences, Nucleic Acid*
  • Transcription Factors / genetics
  • Transcription Factors / metabolism

Substances

  • RNA, Messenger
  • Transcription Factors

Grants and funding

The work was supported by funding from CNRS, Plan d’Investissement d’Avenir #ANR-11-BINF-0002 Institut de Biologie Computationnelle (young investigator grant to CHL and post-doctoral fellowship to JV), Labex NUMEV (post-doctoral fellowship to JV), INSERM-ITMO Cancer project “LIONS” BIO2015-04. MT is a recipient of a CBS2-I2S joint doctoral fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.