RNA-GPS predicts high-resolution RNA subcellular localization and highlights the role of splicing

RNA. 2020 Jul;26(7):851-865. doi: 10.1261/rna.074161.119. Epub 2020 Mar 27.

Abstract

Subcellular localization is essential to RNA biogenesis, processing, and function across the gene expression life cycle. However, the specific nucleotide sequence motifs that direct RNA localization are incompletely understood. Fortunately, new sequencing technologies have provided transcriptome-wide atlases of RNA localization, creating an opportunity to leverage computational modeling. Here we present RNA-GPS, a new machine learning model that uses nucleotide-level features to predict RNA localization across eight different subcellular locations-the first to provide such a wide range of predictions. RNA-GPS's design enables high-throughput sequence ablation and feature importance analyses to probe the sequence motifs that drive localization prediction. We find localization informative motifs to be concentrated on 3'-UTRs and scattered along the coding sequence, and motifs related to splicing to be important drivers of predicted localization, even for cytotopic distinctions for membraneless bodies within the nucleus or for organelles within the cytoplasm. Overall, our results suggest transcript splicing is one of many elements influencing RNA subcellular localization.

Keywords: RNA localization; localization mechanism; machine learning model; splicing in localization.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • 3' Untranslated Regions / genetics
  • Alternative Splicing / genetics*
  • Cell Line, Tumor
  • Cell Nucleus / genetics
  • Computational Biology / methods
  • Cytoplasm / genetics
  • HeLa Cells
  • Humans
  • K562 Cells
  • RNA / genetics*
  • Sequence Analysis, RNA / methods
  • Transcriptome / genetics

Substances

  • 3' Untranslated Regions
  • RNA