Beav: a bacterial genome and mobile element annotation pipeline

mSphere. 2024 Aug 28;9(8):e0020924. doi: 10.1128/msphere.00209-24. Epub 2024 Jul 22.

Abstract

Comprehensive and accurate genome annotation is crucial for inferring the predicted functions of an organism. Numerous tools exist to annotate genes, gene clusters, mobile genetic elements, and other diverse features. However, these tools and pipelines can be difficult to install and run, be specialized for a particular element or feature, or lack annotations for larger elements that provide important genomic context. Integrating results across analyses is also important for understanding gene function. To address these challenges, we present the Beav annotation pipeline. Beav is a command-line tool that automates the annotation of bacterial genome sequences, mobile genetic elements, molecular systems and gene clusters, key regulatory features, and other elements. Beav uses existing tools in addition to custom models, scripts, and databases to annotate diverse elements, systems, and sequence features. Custom databases for plant-associated microbes are incorporated to improve annotation of key virulence and symbiosis genes in agriculturally important pathogens and mutualists. Beav includes an optional Agrobacterium-specific pipeline that identifies and classifies oncogenic plasmids and annotates plasmid-specific features. Following the completion of all analyses, annotations are consolidated to produce a single comprehensive output. Finally, Beav generates publication-quality genome and plasmid maps. Beav is on Bioconda and is available for download at https://github.com/weisberglab/beav.

Importance: Annotation of genome features, such as the presence of genes and their predicted function, or larger loci encoding secretion systems or biosynthetic gene clusters, is necessary for understanding the functions encoded by an organism. Genomes can also host diverse mobile genetic elements, such as integrative and conjugative elements and/or phages, that are often not annotated by existing pipelines. These elements can horizontally mobilize genes encoding for virulence, antimicrobial resistance, or other adaptive functions and alter the phenotype of an organism. We developed a software pipeline, called Beav, that combines new and existing tools for the comprehensive annotation of these and other major features. Existing pipelines often misannotate loci important for virulence or mutualism in plant-associated bacteria. Beav includes custom databases and optional workflows for the improved annotation of plant-associated bacteria. Beav is designed to be easy to install and run, making comprehensive genome annotation broadly available to the research community.

Keywords: Agrobacterium tumefaciens; annotation; genomics; mobile genetic elements; plant-microbe interactions.

MeSH terms

  • Bacteria / classification
  • Bacteria / genetics
  • Computational Biology / methods
  • Databases, Genetic
  • Genome, Bacterial*
  • Genomics / methods
  • Interspersed Repetitive Sequences*
  • Molecular Sequence Annotation* / methods
  • Plasmids / genetics
  • Software