Accurately predicting the binding affinities of a large set of diverse ligands to the protein target(s) is extremely important but still a very challenging task in the drug discovery process. However, for the last decades, machine learning scoring functions (MLSF) have greatly impacted computer-aided drug design. Our module called “Rescoring” provides several of ML algorithms (random forests, neural networks) and published feature representations, which will help you in this task. More information...
If you already have your results from a virtual screening task, you may use the “Rescoring” module to estimate the binding affinity by different MLSFs. The whole rescoring process is supported by our free and open-source tool called Open Drug Discovery Toolkit (ODDT).
Many programs can analyze receptor–ligand binding; the output of these programs can potentially serve as the input for our “Rescoring” module. You have to provide separately the structures of your docked ligands (SDF file) into the binding site of your protein target and the coordinates file of your protein structure (PDB format).
The results from selected scoring functions will be available to download as a CSV file and also will be displayed immediately on the results page in the simple tabular form.
RF-Score is a first-in-class machine-learning scoring function for structure-based binding affinity prediction of protein-ligand complexes. Random Forest was used to implicitly capture binding effects that are hard to model explicitly.
RF-Score is a scoring function that circumvents the need for problematic modeling assumptions via nonparametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function.
To learn more about RF-Score and its concept please visit this paper (Ballester and Mitchell, 2010).
RF-Score-VS is a novel Random Forest-based scoring function designed to evaluate the results of high-throughput in silico docking experiments by providing an improved enrichment factor of the results. This is obtained by the inclusion of negative results (inactive ligand complexes - target-decoy structural complexes) into the training set. Its descriptors are based on RF-Score developed by Ballester and Mitchell (Ballester and Mitchell, 2010). The RF-Score-VS is a scoring function trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. The evaluation results show, that RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides a much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). The RF-Score-VS was also tested on an independent test set from the DEKOIS benchmark with comparable results as mentioned above (Wójcikowski et al 2017).
According to the original RF-Score function, the RF-Score-VS is also available in three different flavors:
All versions use the same distance cutoff; a pair is tallied as interacting when the distance between the atoms falls within the 12Å cutoff.
PLEC is a method to represent ligand-receptor complexes based on local atomic environments (Wójcikowski et al. 2019). The method allows for a very fast and accurate description of the interaction of atoms in complexes, which can become the basis for training and optimization of prediction models, e.g. binding affinity values. PLEC provides consistent predictive results for various machine-learning (ML) models. Even the linear model built using PLEC provides better results than competing solutions such as SIRILID, SPLIF or more advanced evaluation functions like RF.
PLECscore - a novel scoring function based on PLEC fingerprints (FPs) - a Protein–Ligand Extended Connectivity (PLEC) FP that implicitly encodes protein–ligand interactions by pairing the ECFP environments from the ligand and the protein. PLEC FPs were used to construct different machine learning models tailored for predicting protein–ligand affinities (pKi∕d). Even the simplest linear model built on the PLEC FP achieved Rp = 0.817 on the Protein Data Bank (PDB) bind v2016 ‘core set’, demonstrating its descriptive power.
The underlying model can be one of:
For details see PLEC publication (Wójcikowski et al 2019).
NN-Score is a neural network-based scoring function designed to aid the computational identification of small-molecule ligands. The evaluation results have shown that neural networks can be effective scoring functions by comparing NNScore directly to AutoDock and Vina using two different metrics for docking efficacy and nine distinct receptor systems. Python re-implementation of NN-Score (version 2), one of the first SF to use neural network architecture. It was trained using a less popular 3D complex dataset thus can provide additional enrichment and validation.
A novel deep neural network estimating the binding affinity of ligand–receptor complexes. The complex is represented with a 3D grid, and the model utilizes a 3D convolution to produce a feature map of this representation, treating the atoms of both proteins and ligands in the same manner. The network was tested on the CASF-2013 ‘scoring power’ benchmark and Astex Diverse Set and outperformed classical scoring functions.