Adaptive one-class Gaussian processes allow accurate prioritization of oncology drug targets

Antonio de Falco; Zoltan Dezso; Francesco Ceccarelli; Luigi Cerulo; Angelo Ciaramella; Michele Ceccarelli

doi:10.1093/bioinformatics/btaa968

Adaptive one-class Gaussian processes allow accurate prioritization of oncology drug targets

Bioinformatics. 2021 Jun 16;37(10):1420-1427. doi: 10.1093/bioinformatics/btaa968.

Authors

Antonio de Falco¹, Zoltan Dezso², Francesco Ceccarelli³, Luigi Cerulo^{1

4}, Angelo Ciaramella⁵, Michele Ceccarelli^{1

6}

Affiliations

¹ BIOGEM Istituto di Ricerche Genetiche "G. Salvatore", 83031 Ariano Irpino, Italy.
² ABBVIE Biotherapeutics, Redwood City, CA 94063, USA.
³ Donald Bren School of Information and Computer Sciences (ICS), Irvine, CA 92697, USA.
⁴ Department of Science and Technologies, University of Sannio, 82100 Benevento, Italy.
⁵ Department Science and Technology, University of Naples Parthenope, 80133 Naples, Italy.
⁶ Department of Electrical Engineering and Information Technology (DIETI), University of Naples" Federico II", 80128 Naples, Italy.

PMID: 33165571
DOI: 10.1093/bioinformatics/btaa968

Abstract

Motivation: The cost of drug development has dramatically increased in the last decades, with the number new drugs approved per billion US dollars spent on R&D halving every year or less. The selection and prioritization of targets is one the most influential decisions in drug discovery. Here we present a Gaussian Process model for the prioritization of drug targets cast as a problem of learning with only positive and unlabeled examples.

Results: Since the absence of negative samples does not allow standard methods for automatic selection of hyperparameters, we propose a novel approach for hyperparameter selection of the kernel in One Class Gaussian Processes. We compare our methods with state-of-the-art approaches on benchmark datasets and then show its application to druggability prediction of oncology drugs. Our score reaches an AUC 0.90 on a set of clinical trial targets starting from a small training set of 102 validated oncology targets. Our score recovers the majority of known drug targets and can be used to identify novel set of proteins as drug target candidates.

Availability and implementation: The matrix of features for each protein is available at: https://bit.ly/3iLgZTa. Source code implemented in Python is freely available for download at https://github.com/AntonioDeFalco/Adaptive-OCGP.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Drug Development
Drug Discovery
Pharmaceutical Preparations*
Proteins
Software*

Substances

Pharmaceutical Preparations
Proteins