Revealing cytotoxic substructures in molecules using deep learning

J Comput Aided Mol Des. 2020 Jul;34(7):731-746. doi: 10.1007/s10822-020-00310-4. Epub 2020 Apr 16.

Abstract

In drug development, late stage toxicity issues of a compound are the main cause of failure in clinical trials. In silico methods are therefore of high importance to guide the early design process to reduce time, costs and animal testing. Technical advances and the ever growing amount of available toxicity data enabled machine learning, especially neural networks, to impact the field of predictive toxicology. In this study, cytotoxicity prediction, one of the earliest handles in drug discovery, is investigated using a deep learning approach trained on a highly consistent in-house data set of over 34,000 compounds with a share of less than 5% of cytotoxic molecules. The model reached a balanced accuracy of over 70%, similar to previously reported studies using Random Forest. Albeit yielding good results, neural networks are often described as a black box lacking deeper mechanistic understanding of the underlying model. To overcome this absence of interpretability, a Deep Taylor Decomposition method is investigated to identify substructures that may be responsible for the cytotoxic effects, the so-called toxicophores. Furthermore, this study introduces cytotoxicity maps which provide a visual structural interpretation of the relevance of these substructures. Using this approach could be helpful in drug development to predict the potential toxicity of a compound as well as to generate new insights into the toxic mechanism. Moreover, it could also help to de-risk and optimize compounds.

Keywords: Cytotoxic substructures; Deep Neural Networks; Deep Taylor Decomposition; Toxicophores.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Cell Survival / drug effects
  • Computer-Aided Design
  • Cytotoxins / chemistry*
  • Cytotoxins / toxicity*
  • Deep Learning*
  • Drug Design
  • Drug Discovery / methods*
  • Drug Discovery / statistics & numerical data
  • HEK293 Cells
  • Hep G2 Cells
  • Humans
  • Models, Biological
  • Neural Networks, Computer
  • Small Molecule Libraries
  • Software
  • Toxicology / statistics & numerical data

Substances

  • Cytotoxins
  • Small Molecule Libraries