AIRR community curation and standardised representation for immunoglobulin and T cell receptor germline sets

William D Lees; Scott Christley; Ayelet Peres; Justin T Kos; Brian Corrie; Duncan Ralph; Felix Breden; Lindsay G Cowell; Gur Yaari; Martin Corcoran; Gunilla B Karlsson Hedestam; Mats Ohlin; Andrew M Collins; Corey T Watson; Christian E Busse; AIRR Community

doi:10.1016/j.immuno.2023.100025

AIRR community curation and standardised representation for immunoglobulin and T cell receptor germline sets

Immunoinformatics (Amst). 2023 Jun:10:100025. doi: 10.1016/j.immuno.2023.100025. Epub 2023 Feb 19.

Authors

William D Lees^{1

2}, Scott Christley³, Ayelet Peres⁴, Justin T Kos⁵, Brian Corrie⁶, Duncan Ralph⁷, Felix Breden⁶, Lindsay G Cowell⁸, Gur Yaari⁴, Martin Corcoran⁹, Gunilla B Karlsson Hedestam⁹, Mats Ohlin¹⁰, Andrew M Collins¹¹, Corey T Watson⁵, Christian E Busse¹²; AIRR Community

Affiliations

¹ Institute of Structural and Molecular Biology, Birkbeck College, London, England.
² Human-Centered Computing and Information Science, Institute for Systems and Computer Engineering Technology and Science, Porto, Portugal.
³ Peter O'Donnell Jr. School of Public Health, UT Southwestern Medical Center, Dallas, TX, USA.
⁴ Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel.
⁵ Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, KY, USA.
⁶ Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada.
⁷ Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
⁸ Peter O'Donnell Jr. School of Public Health, Department of Immunology, School of Biomedical Sciences, UT Southwestern Medical Center, Dallas, TX, USA.
⁹ Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Swede.
¹⁰ Department of Immunotechnology and SciLifeLab, Lund University, Lund, Sweden.
¹¹ School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.
¹² Division of B Cell Immunology, German Cancer Research Center, Heidelberg, Germany.

Abstract

Analysis of an individual's immunoglobulin or T cell receptor gene repertoire can provide important insights into immune function. High-quality analysis of adaptive immune receptor repertoire sequencing data depends upon accurate and relatively complete germline sets, but current sets are known to be incomplete. Established processes for the review and systematic naming of receptor germline genes and alleles require specific evidence and data types, but the discovery landscape is rapidly changing. To exploit the potential of emerging data, and to provide the field with improved state-of-the-art germline sets, an intermediate approach is needed that will allow the rapid publication of consolidated sets derived from these emerging sources. These sets must use a consistent naming scheme and allow refinement and consolidation into genes as new information emerges. Name changes should be minimised, but, where changes occur, the naming history of a sequence must be traceable. Here we outline the current issues and opportunities for the curation of germline IG/TR genes and present a forward-looking data model for building out more robust germline sets that can dovetail with current established processes. We describe interoperability standards for germline sets, and an approach to transparency based on principles of findability, accessibility, interoperability, and reusability.

Keywords: AIRR-seq; Immune receptor; Immune receptor germline; Immune receptor repertoire; Rep-seq.

Abstract

Grants and funding