CANDy: Automated analysis of domain architectures in carbohydrate-active enzymes

PLoS One. 2024 Jul 11;19(7):e0306410. doi: 10.1371/journal.pone.0306410. eCollection 2024.

Abstract

Carbohydrate-active enzymes (CAZymes) can be found in all domains of life and play a crucial role in metabolic and physiological processes. CAZymes often possess a modular structure, comprising not only catalytic domains but also associated domains such as carbohydrate-binding modules (CBMs) and linker domains. By exploring the modular diversity of CAZy families, catalysts with novel properties can be discovered and further insight in their biological functions and evolutionary relationships can be obtained. Here we present the carbohydrate-active enzyme domain analysis tool (CANDy), an assembly of several novel scripts, tools and databases that allows users to analyze the domain architecture of all protein sequences in a given CAZy family. CANDy's usability is shown on glycoside hydrolase family 48, a small yet underexplored family containing multi-domain enzymes. Our analysis reveals the existence of 35 distinct domain assemblies, including eight known architectures, with the remaining assemblies awaiting characterization. Moreover, we substantiate the occurrence of horizontal gene transfer from prokaryotes to insect orthologs and provide evidence for the subsequent removal of auxiliary domains, likely through a gene fission event. CANDy is available at https://github.com/PyEED/CANDy.

MeSH terms

  • Animals
  • Carbohydrate Metabolism
  • Carbohydrates / chemistry
  • Catalytic Domain
  • Glycoside Hydrolases / chemistry
  • Glycoside Hydrolases / genetics
  • Glycoside Hydrolases / metabolism
  • Protein Domains*
  • Software

Substances

  • Glycoside Hydrolases
  • Carbohydrates

Grants and funding

This research was funded by VLAIO-Catalisti Encaps2Control, HBC.2019.0122 and Germany's Excellence Strategy, EXC 2075, grant 390740016.