A sequence-based, deep learning model accurately predicts RNA splicing branchpoints

Joseph M Paggi; Gill Bejerano

doi:10.1261/rna.066290.118

A sequence-based, deep learning model accurately predicts RNA splicing branchpoints

RNA. 2018 Dec;24(12):1647-1658. doi: 10.1261/rna.066290.118. Epub 2018 Sep 17.

Authors

Joseph M Paggi¹, Gill Bejerano^{1

2

3

4}

Affiliations

¹ Department of Computer Science, Stanford University, Stanford, California 94305, USA.
² Department of Developmental Biology, Stanford University, Stanford, California 94305, USA.
³ Department of Pediatrics, Stanford University, Stanford, California 94305, USA.
⁴ Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA.

Abstract

Experimental detection of RNA splicing branchpoints is difficult. To date, high-confidence experimental annotations exist for 18% of 3' splice sites in the human genome. We develop a deep-learning-based branchpoint predictor, LaBranchoR, which predicts a correct branchpoint for at least 75% of 3' splice sites genome-wide. Detailed analysis of cases in which our predicted branchpoint deviates from experimental data suggests a correct branchpoint is predicted in over 90% of cases. We use our predicted branchpoints to identify a novel sequence element upstream of branchpoints consistent with extended U2 snRNA base-pairing, show an association between weak branchpoints and alternative splicing, and explore the effects of genetic variants on branchpoints. We provide genome-wide branchpoint annotations and in silico mutagenesis scores at http://bejerano.stanford.edu/labranchor.

Keywords: RNA splicing; RNA splicing branchpoints; alternative splicing; deep learning.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Alternative Splicing / genetics*
Computer Simulation
Deep Learning
Exons / genetics
Genome, Human / genetics*
Humans
Introns / genetics
Molecular Sequence Annotation
Mutagenesis / genetics
RNA Splice Sites / genetics
RNA Splicing / genetics*
RNA, Small Nuclear / genetics*

Substances

RNA Splice Sites
RNA, Small Nuclear
U2 small nuclear RNA

Abstract

Publication types

MeSH terms

Substances

Grants and funding