Gene count estimation with pytximport enables reproducible analysis of bulk RNA sequencing data in Python

Malte Kuehl; Milagros N Wong; Nicola Wanner; Stefan Bonn; Victor G Puelles

doi:10.1093/bioinformatics/btae700

Gene count estimation with pytximport enables reproducible analysis of bulk RNA sequencing data in Python

Bioinformatics. 2024 Nov 20:btae700. doi: 10.1093/bioinformatics/btae700. Online ahead of print.

Authors

Malte Kuehl^{1

2

3

4}, Milagros N Wong^{1

2

5

6}, Nicola Wanner^{5

6}, Stefan Bonn^{3

4}, Victor G Puelles^{1

2

5

6}

Affiliations

¹ Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.
² Department of Pathology, Aarhus University Hospital, Aarhus, Denmark.
³ Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
⁴ Center for Biomedical AI, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
⁵ III. Department of Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
⁶ Hamburg Center for Kidney Health, Hamburg, Germany.

PMID: 39565903
DOI: 10.1093/bioinformatics/btae700

Abstract

Summary: Transcript quantification tools efficiently map bulk RNA sequencing reads to reference transcriptomes. However, their output consists of transcript count estimates that are subject to multiple biases and cannot be readily used with existing differential gene expression analysis tools in Python.Here we present pytximport, a Python implementation of the tximport R package that supports a variety of input formats, different modes of bias correction, inferential replicates, gene-level summarization of transcript counts, transcript-level exports, transcript-to-gene mapping generation and optional filtering of transcripts by biotype. pytximport is part of the scverse ecosystem of open-source Python software packages for omics analyses and includes both a Python as well as a command-line interface.With pytximport, we propose a bulk RNA sequencing analysis workflow based on Bioconda and scverse ecosystem packages, ensuring reproducible analyses through Snakemake rules. We apply this pipeline to a publicly available RNA-sequencing dataset, demonstrating how pytximport enables the creation of Python-centric workflows capable of providing insights into transcriptomic alterations.

Availability: pytximport is licensed under the GNU General Public License version 3. The source code is available at https://github.com/complextissue/pytximport and via Zenodo with DOI: 10.5281/zenodo.13907917. A related Snakemake workflow is available through GitHub at https://github.com/complextissue/snakemake-bulk-rna-seq-workflow and Zenodo with DOI: 10.5281/zenodo.12713811. Documentation and a vignette for new users are available at: https://pytximport.readthedocs.io.

Supplementary information: Supplementary Material is available at Bioinformatics online.