Explainability in transformer models for functional genomics

Jim Clauwaert; Gerben Menschaert; Willem Waegeman

doi:10.1093/bib/bbab060

Explainability in transformer models for functional genomics

Brief Bioinform. 2021 Sep 2;22(5):bbab060. doi: 10.1093/bib/bbab060.

Authors

Jim Clauwaert¹, Gerben Menschaert¹, Willem Waegeman¹

Affiliation

¹ Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000 Gent, Belgium.

Abstract

The effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models. In this paper, we present a new approach that has been successful in gathering insights on the transcription process in Escherichia coli. This work builds upon a transformer-based neural network framework designed for prokaryotic genome annotation purposes. We find that the majority of subunits (attention heads) of the model are specialized towards identifying transcription factors and are able to successfully characterize both their binding sites and consensus sequences, uncovering both well-known and potentially novel elements involved in the initiation of the transcription process. With the specialization of the attention heads occurring automatically, we believe transformer models to be of high interest towards the creation of explainable neural networks in this field.

Keywords: DNA-binding sites; functional genomics; interpretable neural networks; transformers.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Base Sequence
Binding Sites
DNA, Bacterial / genetics
DNA, Bacterial / metabolism
Deep Learning*
Escherichia coli / genetics*
Escherichia coli / metabolism
Genome, Bacterial*
Genomics / methods*
Promoter Regions, Genetic / genetics
Transcription Factors / genetics
Transcription Factors / metabolism
Transcription Initiation Site*

Substances

DNA, Bacterial
Transcription Factors