DIAproteomics: A Multifunctional Data Analysis Pipeline for Data-Independent Acquisition Proteomics and Peptidomics

Leon Bichmann; Shubham Gupta; George Rosenberger; Leon Kuchenbecker; Timo Sachsenberg; Phil Ewels; Oliver Alka; Julianus Pfeuffer; Oliver Kohlbacher; Hannes Röst

doi:10.1021/acs.jproteome.1c00123

DIAproteomics: A Multifunctional Data Analysis Pipeline for Data-Independent Acquisition Proteomics and Peptidomics

J Proteome Res. 2021 Jul 2;20(7):3758-3766. doi: 10.1021/acs.jproteome.1c00123. Epub 2021 Jun 21.

Authors

Leon Bichmann^{1

2}, Shubham Gupta³, George Rosenberger⁴, Leon Kuchenbecker¹, Timo Sachsenberg¹, Phil Ewels⁵, Oliver Alka¹, Julianus Pfeuffer^{1

6

7}, Oliver Kohlbacher^{1

8

9}, Hannes Röst³

Affiliations

¹ Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany.
² Institute for Cell Biology, Department of Immunology, University of Tübingen, Tübingen 72076, Germany.
³ Donnelly Center for Biomolecular Research, University of Toronto, Toronto, Ontario ON M5S 3E1, Canada.
⁴ Department of Systems Biology, Columbia University, New York, New York 10032, United States.
⁵ Science for Life Laboratory (SciLifeLab), Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden.
⁶ Institute for Informatics, Freie Universität Berlin, Berlin 14195, Germany.
⁷ Zuse Institute Berlin, Berlin 14195, Germany.
⁸ Institute for Biological and Medical Informatics, University of Tübingen, Tübingen 72076, Germany.
⁹ Institute for Translational Bioinformatics, University Hospital Tübingen, Tübingen 72076, Germany.

PMID: 34153189
DOI: 10.1021/acs.jproteome.1c00123

Abstract

Data-independent acquisition (DIA) is becoming a leading analysis method in biomedical mass spectrometry. The main advantages include greater reproducibility and sensitivity and a greater dynamic range compared with data-dependent acquisition (DDA). However, the data analysis is complex and often requires expert knowledge when dealing with large-scale data sets. Here we present DIAproteomics, a multifunctional, automated, high-throughput pipeline implemented in the Nextflow workflow management system that allows one to easily process proteomics and peptidomics DIA data sets on diverse compute infrastructures. The central components are well-established tools such as the OpenSwathWorkflow for the DIA spectral library search and PyProphet for the false discovery rate assessment. In addition, it provides options to generate spectral libraries from existing DDA data and to carry out the retention time and chromatogram alignment. The output includes annotated tables and diagnostic visualizations from the statistical postprocessing and computation of fold-changes across pairwise conditions, predefined in an experimental design. DIAproteomics is well documented open-source software and is available under a permissive license to the scientific community at https://www.openms.de/diaproteomics/.

Keywords: automation; cloud computing; data processing; data-independent acquisition; peptidomics; proteomics; spectral library generation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Data Analysis*
Mass Spectrometry
Proteomics*
Reproducibility of Results
Software