Uncovering Hidden Members and Functions of the Soil Microbiome Using De Novo Metaproteomics

Joon-Yong Lee; Hugh D Mitchell; Meagan C Burnet; Ruonan Wu; Sarah C Jenson; Eric D Merkley; Ernesto S Nakayasu; Carrie D Nicora; Janet K Jansson; Kristin E Burnum-Johnson; Samuel H Payne

doi:10.1021/acs.jproteome.2c00334

Uncovering Hidden Members and Functions of the Soil Microbiome Using De Novo Metaproteomics

J Proteome Res. 2022 Aug 5;21(8):2023-2035. doi: 10.1021/acs.jproteome.2c00334. Epub 2022 Jul 6.

Affiliations

¹ Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.
² Signature Sciences and Technology Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.
³ Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.
⁴ Biology Department, Brigham Young University, Provo, Utah 84602, United States.

Abstract

Metaproteomics has been increasingly utilized for high-throughput characterization of proteins in complex environments and has been demonstrated to provide insights into microbial composition and functional roles. However, significant challenges remain in metaproteomic data analysis, including creation of a sample-specific protein sequence database. A well-matched database is a requirement for successful metaproteomics analysis, and the accuracy and sensitivity of PSM identification algorithms suffer when the database is incomplete or contains extraneous sequences. When matched DNA sequencing data of the sample is unavailable or incomplete, creating the proteome database that accurately represents the organisms in the sample is a challenge. Here, we leverage a de novo peptide sequencing approach to identify the sample composition directly from metaproteomic data. First, we created a deep learning model, Kaiko, to predict the peptide sequences from mass spectrometry data and trained it on 5 million peptide-spectrum matches from 55 phylogenetically diverse bacteria. After training, Kaiko successfully identified organisms from soil isolates and synthetic communities directly from proteomics data. Finally, we created a pipeline for metaproteome database generation using Kaiko. We tested the pipeline on native soils collected in Kansas, showing that the de novo sequencing model can be employed as an alternative and complementary method to construct the sample-specific protein database instead of relying on (un)matched metagenomes. Our pipeline identified all highly abundant taxa from 16S rRNA sequencing of the soil samples and uncovered several additional species which were strongly represented only in proteomic data.

Keywords: de novo sequencing; deep learning model; metaproteomics; soil microbiome.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Microbiota* / genetics
Peptides / analysis
Peptides / genetics
Proteome / genetics
Proteomics* / methods
RNA, Ribosomal, 16S / genetics
Soil

Substances

Peptides
Proteome
RNA, Ribosomal, 16S
Soil