Best genome sequencing strategies for annotation of complex immune gene families in wildlife

Emma Peel; Luke Silver; Parice Brandies; Ying Zhu; Yuanyuan Cheng; Carolyn J Hogg; Katherine Belov

doi:10.1093/gigascience/giac100

Best genome sequencing strategies for annotation of complex immune gene families in wildlife

Gigascience. 2022 Oct 30:11:giac100. doi: 10.1093/gigascience/giac100.

Authors

Emma Peel^{1

2}, Luke Silver¹, Parice Brandies¹, Ying Zhu³, Yuanyuan Cheng¹, Carolyn J Hogg^{1

2}, Katherine Belov^{1

2}

Affiliations

¹ School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia.
² Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, University of Sydney, Sydney NSW 2006, Australia.
³ Sichuan Provincial Academy of Natural Resource Sciences, Chengdu, Sichuan 610000, China.

Abstract

Background: The biodiversity crisis and increasing impact of wildlife disease on animal and human health provides impetus for studying immune genes in wildlife. Despite the recent boom in genomes for wildlife species, immune genes are poorly annotated in nonmodel species owing to their high level of polymorphism and complex genomic organisation. Our research over the past decade and a half on Tasmanian devils and koalas highlights the importance of genomics and accurate immune annotations to investigate disease in wildlife. Given this, we have increasingly been asked the minimum levels of genome quality required to effectively annotate immune genes in order to study immunogenetic diversity. Here we set out to answer this question by manually annotating immune genes in 5 marsupial genomes and 1 monotreme genome to determine the impact of sequencing data type, assembly quality, and automated annotation on accurate immune annotation.

Results: Genome quality is directly linked to our ability to annotate complex immune gene families, with long reads and scaffolding technologies required to reassemble immune gene clusters and elucidate evolution, organisation, and true gene content of the immune repertoire. Draft-quality genomes generated from short reads with HiC or 10× Chromium linked reads were unable to achieve this. Despite mammalian BUSCOv5 scores of up to 94.1% amongst the 6 genomes, automated annotation pipelines incorrectly annotated up to 59% of manually annotated immune genes regardless of assembly quality or method of automated annotation.

Conclusions: Our results demonstrate that long reads and scaffolding technologies, alongside manual annotation, are required to accurately study the immune gene repertoire of wildlife species.

Keywords: MHC; annotation; disease; genome; immune gene; quality; wildlife.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Animals, Wild* / genetics
Base Sequence
Genome
Genomics*
Humans
Mammals
Molecular Sequence Annotation