PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery

Huaqing Liu; Peiyi Chen; Xiaochen Zhai; Ku-Geng Huo; Shuxian Zhou; Lanqing Han; Guoxin Fan

doi:10.1038/s41597-024-03997-4

PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery

Sci Data. 2024 Dec 3;11(1):1316. doi: 10.1038/s41597-024-03997-4.

Authors

Huaqing Liu^#¹, Peiyi Chen^#¹, Xiaochen Zhai², Ku-Geng Huo³, Shuxian Zhou¹, Lanqing Han^{4

5}, Guoxin Fan⁶

Affiliations

¹ Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Pearl River Delta, Guangzhou, 510700, China.
² Cyagen Biosciences (Suzhou) Inc., Guangzhou, 215000, China.
³ Cyagen Biosciences (Guangzhou) Inc., Guangzhou, 510700, China.
⁴ Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Pearl River Delta, Guangzhou, 510700, China. hanlance@tsinghua-gd.org.
⁵ Cyagen Biomodels (Guangzhou) Co., Ltd, Guangzhou, 510700, China. hanlance@tsinghua-gd.org.
⁶ Department of Pain Medicine, Shenzhen Nanshan People's Hospital, Shenzhen University Medical School, Shenzhen, 518056, China. fanguoxin@email.szu.edu.cn.

^# Contributed equally.

PMID: 39627219
DOI: 10.1038/s41597-024-03997-4

Abstract

Prediction of protein-protein binding (PPB) affinity plays an important role in large-molecular drug discovery. Deep learning (DL) has been adopted to predict the changes of PPB binding affinities upon mutations, but there was a scarcity of studies predicting the PPB affinity itself. The major reason is the paucity of open-source dataset with PPB affinity data. To address this gap, the current study introduced a large comprehensive PPB affinity (PPB-Affinity) dataset. The PPB-Affinity dataset contains key information such as crystal structures of protein-protein complexes (with or without protein mutation patterns), PPB affinity, receptor protein chain, ligand protein chain, etc. To the best of our knowledge, this is the largest publicly available PPB affinity dataset, and we believe it will significantly advance drug discovery by streamlining the screening of potential large-molecule drugs. We also developed a deep-learning benchmark model with this dataset to predict the PPB affinity, providing a foundational comparison for the research community.

Publication types

Dataset

MeSH terms

Deep Learning
Drug Discovery*
Ligands
Protein Binding*
Proteins* / chemistry
Proteins* / metabolism

Substances

Proteins
Ligands