SG-ML-PLAP: A structure-guided machine learning-based scoring function for protein-ligand binding affinity prediction

Protein Sci. 2025 Jan;34(1):e5257. doi: 10.1002/pro.5257.

Abstract

Computational methods to predict binding affinity of protein-ligand complex have been used extensively to design inhibitors for proteins selected as drug targets. In recent years machine learning (ML) is being increasingly used for design of drugs/inhibitors. However, ranking compounds as per their experimental binding affinity has remained a major challenge. Therefore, it is necessary to develop ML-based scoring function (MLSF) for predicting the binding affinity of protein-ligand complexes. In this work, protein-ligand interaction features, namely, extended connectivity interaction fingerprints (ECIF), derived from the PDBbind dataset have been used to build ML models for binding affinity prediction. The benchmarking has been done on the Comparative Assessment of Scoring Functions (CASF) dataset and also by predicting the binding affinity of unseen protein-ligand complexes which have structural features different from those present in the training dataset. Furthermore, an improvement in the performance of MLSF on the redocked CASF complexes generated by AutoDock Vina software was seen when the training set consisting of crystal structures was supplemented with redocked protein-ligand complexes. The MLSF trained on crystal structures alone using a combination of ECIF and VINA features also predicted binding affinities of crystal as well as docked complexes with high accuracy. Overall, the MLSF developed in this work shows improved performance compared to conventional SFs and several other MLSFs. It will be a valuable resource for identifying novel inhibitors by structure-based virtual screening protocols. The proposed MLSF SG-ML-PLAP (Structure-Guided Machine-Learning-based Protein-Ligand Affinity Predictor) is freely accessible as a webserver, http://www.nii.ac.in/sg-ml-plap.html.

Keywords: ECIF; docking; gradient boosted tree; machine learning scoring function; neural network; protein–ligand binding affinity; random forest.

MeSH terms

  • Databases, Protein
  • Ligands
  • Machine Learning*
  • Molecular Docking Simulation
  • Protein Binding*
  • Proteins* / chemistry
  • Proteins* / metabolism
  • Software

Substances

  • Ligands
  • Proteins