Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data

Nucleic Acids Res. 2023 Nov 10;51(20):e104. doi: 10.1093/nar/gkad810.

Abstract

Small exons are pervasive in transcriptomes across organisms, and their quantification in RNA isoforms is crucial for understanding gene functions. Although long-read RNA-seq based on Oxford Nanopore Technologies (ONT) offers the advantage of covering transcripts in full length, its lower base accuracy poses challenges for identifying individual exons, particularly microexons (≤ 30 nucleotides). Here, we systematically assess small exons quantification in synthetic and human ONT RNA-seq datasets. We demonstrate that reads containing small exons are often not properly aligned, affecting the quantification of relevant transcripts. Thus, we develop a local-realignment method for misaligned exons (MisER), which remaps reads with misaligned exons to the transcript references. Using synthetic and simulated datasets, we demonstrate the high sensitivity and specificity of MisER for the quantification of transcripts containing small exons. Moreover, MisER enabled us to identify small exons with a higher percent spliced-in index (PSI) in neural, particularly neural-regulated microexons, when comparing 14 neural to 16 non-neural tissues in humans. Our work introduces an improved quantification method for long-read RNA-seq and especially facilitates studies using ONT long-reads to elucidate the regulation of genes involving small exons.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Exons*
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Protein Isoforms / genetics
  • RNA
  • RNA Isoforms* / genetics
  • RNA-Seq
  • Sequence Analysis, RNA* / methods
  • Transcriptome

Substances

  • Protein Isoforms
  • RNA
  • RNA Isoforms