So we left you hanging a bit in the previous post for which we apologise. William of Ockham was about to do battle with Random Forest, armed only with what would appear to be a singularly inadequate razor. We’ll have to apologise again because you’re going to have to wait a while longer for the final showdown. We realise that many of our patient and loyal readers may not have encountered the sorts of predictive models that William of Ockham is licensed to invalidate and as a public service we’ll take a quick look at 3 publications. Our objective in this post is not to review these models but merely to use them to show you why studies like these might be of interest to Mr Ockham.
In the first article, industrial researchers present methods for predicting hERG liability in compound libraries using their own data which was not made available to readers or, presumably, the reviewers of this paper. We extend special sympathy to the reviewers of this article because we just can’t tell whether the models described within are useful and highly predictive or of a value that is largely calorific. This is a general theme which we will re-visit in future posts.
In the second article, industrial researchers present methods for prediction of volume of distribution. Volumes of distribution and calculated properties, although not the structures, for the training set compounds were shared as supplemental material.
In the third article, academic researchers present methods for predicting aqueous solubility. Structures and measures solubility for training and test sets were shared as supplemental material.
The authors of these articles share their data sets to varying degrees however none appear to be particularly forthcoming with the predictive models themselves. The second article presents 31 parameter values for a multi-linear regression model in the supplemental material but the random forest remains an almost complete mystery. Is it fair that a medicinal chemist needs to provide spectral data for new compounds while a predictive modeller can get away with root mean square error and and r-square? Don’t ask us for we are simple folk and we just write The Crapshoot.
So if you think that reading an article on predictive modelling of clearance, volume, CYP inhibition, hERG blockade, solubility or plasma protein binding is going to provide you with a practical means to predict any of these quantities, you may wish to prepare yourselves for disappointment.
In the next post, we’ll be taking a look at the ubiquitous problem of over-fitting. William of Ockham is already sharpening his razor.
next
Subscribe to:
Post Comments (Atom)

0 comments:
Post a Comment