USR-VS: a web server for large-scale prospective virtual screening using ultrafast shape recognition techniques

Only a few small-molecule ligands are known for your target? Need a validated tool to find new binders with different chemical scaffold for this target? In that case, you definitely want to try this user-friendly webserver:

An example of its use is described here:

Further details in the associated paper:

Abstract: Ligand-based Virtual Screening (VS) methods aim at identifying molecules with a similar activity profile across phenotypic and macromolecular targets to that of a query molecule used as search template. VS using 3D similarity methods have the advantage of biasing this search toward active molecules with innovative chemical scaffolds, which are highly sought after in drug design to provide novel leads with improved properties over the query molecule (e.g. patentable, of lower toxicity or increased potency). Ultrafast Shape Recognition (USR) has demonstrated excellent performance in the discovery of molecules with previously-unknown phenotypic or target activity, with retrospective studies suggesting that its pharmacophoric extension (USRCAT) should obtain even better hit rates once it is used prospectively. Here we present USR-VS (, the first web server using these two validated ligand-based 3D methods for large-scale prospective VS. In about 2 seconds, 93.9 million 3D conformers, expanded from 23.1 million purchasable molecules, are screened and the 100 most similar molecules among them in terms of 3D shape and pharmacophoric properties are shown. USR-VS functionality also provides interactive visualization of the similarity of the query molecule against the hit molecules as well as vendor information to purchase selected hits in order to be experimentally tested.

Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening

This review overviews the relatively new topic of machine-learning scoring functions for docking. A PDF is available at

This is the abstract:

Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure-based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine-learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine-learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert-selected structural features can be strongly improved by a machine-learning approach based on nonlinear regression allied with comprehensive data-driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development.


How Reliable Are Ligand-Centric Methods for Target Fishing?

We recently published a paper on computational methods for molecular target prediction prediction. A PDF of this article can be downloaded at

Computational methods for Target Fishing (TF), also known as Target Prediction or Polypharmacology Prediction, can be used to discover new targets for small-molecule drugs. This may result in repositioning the drug in a new indication or improving our current understanding of its efficacy and side effects. While there is a substantial body of research on TF methods, there is still a need to improve their validation, which is often limited to a small part of the available targets and not easily interpretable by the user. Here we discuss how target-centric TF methods are inherently limited by the number of targets that can possibly predict (this number is by construction much larger in ligand-centric techniques). We also propose a new benchmark to validate TF methods, which is particularly suited to analyse how predictive performance varies with the query molecule. On average over approved drugs, we estimate that only five predicted targets will have to be tested to find two true targets with submicromolar potency (a strong variability in performance is however observed). In addition, we find that an approved drug has currently an average of eight known targets, which reinforces the notion that polypharmacology is a common and strong event. Furthermore, with the assistance of a control group of randomly-selected molecules, we show that the targets of approved drugs are generally harder to predict. The benchmark and a simple target prediction method to use as a performance baseline are available at

The way back machine

I was just looking at one of my favourite websites, the web archive, which is helpful to dive into the past. Or at least that part of the past that can be captured digitally (web, audio, video,…).

In the web section, the way back machine, one can check how a particular website looked like many years before. Check for instance the modern look of the EBI website today. Now have a look to this website on 6 June 1997. What a change, right? Most links are active, so you can inspect services available at that time, who was around, etc. If you wish to explore the website at some other moment in time, there is a navigation bar on the top the bar which take you to the other available captures.

Warning: this is a serious time sink!

Searching for two postdocs for my research lab in Marseille

logo-crcmI have just become a group leader at the CRCM in Marseille. Thus, I am currently searching for postdocs working in areas related to bioinformatics and drug discovery informatics.

The first post is to work on modelling cancer pharmacogenomics:

The second post will investigate new methods for drug polypharmacology prediction:

Both positions will be for two years in the first instance. The deadline for applications is Friday 17 October 2014.

Annual Symposium of MRC Fellows at BMA House

BMA-MRC-fellows-symposium Every year, the MRC organises a one-day symposium for its fellows, which is also attended by MRC panel members and staff. In addition to networking opportunities,  a number of very informative sessions are organised such as those on “Grant Writing”, “Establishing Successful Partnerships and Collaborations”, “Board and Panels – How do they Work?” or “Mentoring”. This year the symposium took place two days ago at the BMA House in London.

A recurrent topic in these meetings has been the progress on the Crick Institute. Jim Smith, research director for the Crick Institute, explained how they plan to appoint a number of early career scientists and provide them with group leader funding for 12 years. The latter is the limit of tenure of these positions, which is three years longer than similar schemes (EMBL) to facilitate a balance between career and family commitments.

As it has been the case in previous years, Professor Sir John Savill, chief executive of the MRC, closed the symposium. He highlighted computational biology as one of the areas that need to be more strongly supported in the future. When last year he made a similar comment on bioinformatics, I asked him how the MRC was planning to intensify its already existing support. His reply highlighted the work that is done at EBI and how further funding research at this type of institutions was one route to strengthen this area.