MOCCASIN: a method for correcting for known and unknown confounders in RNA splicing analysis

Nat Commun. 2021 Jun 7;12(1):3353. doi: 10.1038/s41467-021-23608-9.

Abstract

The effects of confounding factors on gene expression analysis have been extensively studied following the introduction of high-throughput microarrays and subsequently RNA sequencing. In contrast, there is a lack of equivalent analysis and tools for RNA splicing. Here we first assess the effect of confounders on both expression and splicing quantifications in two large public RNA-Seq datasets (TARGET, ENCODE). We show quantification of splicing variations are affected at least as much as those of gene expression, revealing unwanted sources of variations in both datasets. Next, we develop MOCCASIN, a method to correct the effect of both known and unknown confounders on RNA splicing quantification and demonstrate MOCCASIN's effectiveness on both synthetic and real data. Code, synthetic and corrected datasets are all made available as resources.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Databases, Genetic
  • Gene Expression Profiling / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • K562 Cells
  • RNA Splicing*
  • RNA-Seq / methods
  • Reproducibility of Results
  • Software