High-quality peptide evidence for annotating non-canonical open reading frames as human proteins

Eric W Deutsch; Leron W Kok; Jonathan M Mudge; Jorge Ruiz-Orera; Ivo Fierro-Monti; Zhi Sun; Jennifer G Abelin; M Mar Alba; Julie L Aspden; Ariel A Bazzini; Elspeth A Bruford; Marie A Brunet; Lorenzo Calviello; Steven A Carr; Anne-Ruxandra Carvunis; Sonia Chothani; Jim Clauwaert; Kellie Dean; Pouya Faridi; Adam Frankish; Norbert Hubner; Nicholas T Ingolia; Michele Magrane; Maria Jesus Martin; Thomas F Martinez; Gerben Menschaert; Uwe Ohler; Sandra Orchard; Owen Rackham; Xavier Roucou; Sarah A Slavoff; Eivind Valen; Aaron Wacholder; Jonathan S Weissman; Wei Wu; Zhi Xie; Jyoti Choudhary; Michal Bassani-Sternberg; Juan Antonio Vizcaíno; Nicola Ternette; Robert L Moritz; John R Prensner; Sebastiaan van Heesch

doi:10.1101/2024.09.09.612016

High-quality peptide evidence for annotating non-canonical open reading frames as human proteins

bioRxiv [Preprint]. 2024 Sep 9:2024.09.09.612016. doi: 10.1101/2024.09.09.612016.

Authors

Eric W Deutsch¹, Leron W Kok^{2

3}, Jonathan M Mudge⁴, Jorge Ruiz-Orera⁵, Ivo Fierro-Monti⁴, Zhi Sun¹, Jennifer G Abelin⁶, M Mar Alba^{7

8}, Julie L Aspden⁹, Ariel A Bazzini^{10

11}, Elspeth A Bruford¹², Marie A Brunet^{13

14}, Lorenzo Calviello¹⁵, Steven A Carr⁶, Anne-Ruxandra Carvunis^{16

17}, Sonia Chothani¹⁸, Jim Clauwaert^{19

20}, Kellie Dean²¹, Pouya Faridi^{22

23}, Adam Frankish⁴, Norbert Hubner^{5

24

25

26}, Nicholas T Ingolia²⁷, Michele Magrane⁴, Maria Jesus Martin⁴, Thomas F Martinez^{28

29

30}, Gerben Menschaert³¹, Uwe Ohler^{32

33}, Sandra Orchard⁴, Owen Rackham³⁴, Xavier Roucou³⁵, Sarah A Slavoff^{36

37

38}, Eivind Valen³⁹, Aaron Wacholder^{16

17}, Jonathan S Weissman^{40

41

42

43}, Wei Wu^{44

45}, Zhi Xie⁴⁶, Jyoti Choudhary⁴⁷, Michal Bassani-Sternberg^{48

49

50}, Juan Antonio Vizcaíno⁴, Nicola Ternette^{51

52}, Robert L Moritz¹, John R Prensner^{19

20}, Sebastiaan van Heesch^{2

3}

Affiliations

¹ Institute for Systems Biology (ISB), Seattle, WA, 98109, USA.
² Princess Máxima Center for Pediatric Oncology, Utrecht, 3584 CS, The Netherlands.
³ Oncode Institute, Utrecht, The Netherlands.
⁴ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK.
⁵ Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, 13125, Germany.
⁶ Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
⁷ Hospital del Mar Research Institute, Barcelona, Spain.
⁸ Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain.
⁹ School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK.
¹⁰ Stowers Institute for Medical Research, Kansas City, MO, 64110, USA.
¹¹ Department of Molecular and Integrative Physiology, University of Kansas Medical Center, Kansas City, KS, 66160, USA.
¹² HUGO Gene Nomenclature Committee (HGNC), Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK.
¹³ Pediatrics Department, University of Sherbrooke, Sherbrooke, Québec, Canada.
¹⁴ Centre de Recherche du Centre hospitalier universitaire de Sherbrooke (CRCHUS), Sherbrooke, Québec, Canada.
¹⁵ Human Technopole, Milan, 20157, Italy.
¹⁶ Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
¹⁷ Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
¹⁸ Centre for Computational Biology and Program in Cardiovascular and Metabolic Disorders, Duke-NUS (National University of Singapore) Medical School, Singapore.
¹⁹ Department of Pediatrics, Division of Pediatric Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
²⁰ Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
²¹ School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland.
²² Centre for Cancer Research, Hudson Institute of Medical Research, Clayton, VIC, Australia.
²³ Monash Proteomics & Metabolomics Platform, Department of Medicine, School of Clinical Sciences, Monash University, Clayton, VIC, Australia.
²⁴ Charité-Universitätsmedizin Berlin, Berlin, 10117, Germany.
²⁵ Helmholtz-Institute for Translational AngioCardioScience (HI-TAC) of the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC) at Heidelberg University, Heidelberg, 69117, Germany.
²⁶ DZHK (German Center for Cardiovascular Research), Partner Site Berlin, Berlin, 13347, Germany.
²⁷ Department of Molecular and Cell Biology, Center for Computational Biology, University of California, Berkeley, Berkeley, CA, 94720-3202, USA.
²⁸ Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA, 92617, USA.
²⁹ Department of Biological Chemistry, University of California, Irvine, Irvine, CA, 92617, USA.
³⁰ Chao Family Comprehensive Cancer Center, University of California, Irvine, Irvine, CA, 92617, USA.
³¹ Biobix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Ghent, Belgium.
³² Department of Biology, Humboldt University Berlin, Berlin, 10117, Germany.
³³ Berlin Institute of Medical Systems Biology (BIMSB), Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, 10115, Germany.
³⁴ University of Southampton, Southampton, UK.
³⁵ Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, Québec, Canada.
³⁶ Department of Chemistry, Yale University, New Haven, CT, 06520, USA.
³⁷ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA.
³⁸ Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, 06516, USA.
³⁹ Department of Biosciences, University of Oslo, Oslo, Norway.
⁴⁰ Whitehead Institute for Biomedical Research, Cambridge, MA, 02142, USA.
⁴¹ Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA.
⁴² Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, 02138, USA.
⁴³ David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
⁴⁴ Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), Singapore.
⁴⁵ Department of Pharmacy & Pharmaceutical sciences, National University of Singapore (NUS), Singapore.
⁴⁶ State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
⁴⁷ Functional Proteomics Group, Institute of Cancer Research, Chester Betty Labs, London, SW3 6JB, UK.
⁴⁸ Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, 1005, Switzerland.
⁴⁹ Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Lausanne, 1005, Switzerland.
⁵⁰ Agora Cancer Research Centre, Lausanne, 1011, Switzerland.
⁵¹ School of Life Sciences, Division Cell Signalling and Immunology, University of Dundee, Dundee, DD1 5EH, UK.
⁵² Centre for Immuno-Oncology, University of Oxford, Oxford, OX37DQ, UK.

Abstract

A major scientific drive is to characterize the protein-coding genome as it provides the primary basis for the study of human health. But the fundamental question remains: what has been missed in prior genomic analyses? Over the past decade, the translation of non-canonical open reading frames (ncORFs) has been observed across human cell types and disease states, with major implications for proteomics, genomics, and clinical science. However, the impact of ncORFs has been limited by the absence of a large-scale understanding of their contribution to the human proteome. Here, we report the collaborative efforts of stakeholders in proteomics, immunopeptidomics, Ribo-seq ORF discovery, and gene annotation, to produce a consensus landscape of protein-level evidence for ncORFs. We show that at least 25% of a set of 7,264 ncORFs give rise to translated gene products, yielding over 3,000 peptides in a pan-proteome analysis encompassing 3.8 billion mass spectra from 95,520 experiments. With these data, we developed an annotation framework for ncORFs and created public tools for researchers through GENCODE and PeptideAtlas. This work will provide a platform to advance ncORF-derived proteins in biomedical discovery and, beyond humans, diverse animals and plants where ncORFs are similarly observed.

Keywords: GENCODE; Human Proteome Project; Ribo-seq; immunopeptidomics; mass spectrometry; microproteins; non-canonical ORFs; proteomics; translation.

Publication types

Preprint

Abstract

Publication types

Grants and funding