A machine learning approach to docking: improving SFC models

Picture1The Scoring Function Consortium (SFC) was a collaborative effort with various pharmaceutical companies and the Cambridge Crystallographic Data Centre (CCDC) intended to compile structural data and subsequently use it to setup different training sets for the parameterization of new scoring functions. Over 60 different descriptors were evaluated for all complexes, which led to the most accurate scoring functions at that time.

Recently, the group of one of the leading authors in the SFC consortium, Prof. Sotriffer, has published a paper with improved SFC scoring functions. By applying our proposed machine learning approach, the correlations of SFC scoring functions increased from 0.64 to 0.78 on the PDBbind benchmark. This is a very large improvement for this problem, especially taking into account that only the regression model was changed (descriptors, training set and test set remain the same). The performance on the diverse CSAR-NRC set was also very high. It would be very interesting to see how well this new scoring function compares to functions tested on the CSAR-NRC test set, once the same training set is used for all functions. As it was explained here, the anonymous scoring functions tested on CSAR-NRC used different training sets, whose composition was not disclosed but are known to overlap with complexes in this test set.  More information about SFC scoring functions can be found in the slides of a talk for 3rd Strasbourg Summer School on Chemoinformatics.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s