Once again we have let things slip and once again we crave the indulgence of our loyal readers. It is well over 3 weeks since we last posted. Today we’ll take a look at a review on druglikeness which appeared in a high impact journal. Strictly it is the parent journal that has the high impact factor but we’re not going to get bogged down by that minor detail. This article has already been cited here in connection with categorical sins. Much (mainly white) noise about it has been made where we work.
Actually we’re not going to review the entire article. Those of you who have nothing better to do than read this column will know that we are typically underwhelmed by publications on druglikeness and believe that as a concept it is rather over-rated. Instead we will focus on one particular piece of data analysis that is described in the target publication. Are we being lazy? Read on and you can make your own minds up. You are adults after all.
We’d now like you to take a look at Figure 3a in the featured article. The horizontal axis is ClogP and the vertical axis is promiscuity. Promiscuity? Are the drugs getting up to something naughty about which we shouldn’t be writing in a family friendly blog? Promiscuity in this plot is defined by the number of assays for which at least 30% inhibition is observed at a concentration of 10 micromolar. The plot suggests a strong relationship between promiscuity and lipophilicity, doesn’t it? Well that’s what the authors of the article want you to think but, as loyal and cultured readers of the Crapshoot, you really should know better by now.
Now let’s take a closer look at Figure 3a. First the horizontal axis is not ClogP but Median ClogP. Where did the median come from? A reasonable question and, if you’ll just let us continue, everything will become abundantly clear. Well sort of abundantly clear. The authors appear to have computed the median ClogP for each value of promiscuity. Why have they done this? The quick answer this very reasonable question is go take a look at Box S3 in the supplementary information.
The manner in which Figure 3a has been constructed gives it some rather unusual characteristics. Most importantly each value of promiscuity is represented by a single point regardless of the number of drugs with that value of promiscuity. This distorts the original data by emphasizing the tails of the distribution and we think it’s a rather naughty thing to do. Plotting the data as the authors have done displays the underlying trend in the data while hiding the variation in ClogP for individual values of the promiscuity. This makes the trend easier to see but prevents us from knowing how strong it really is.
One common approach to quantifying the strength of the relationships between two properties is to fit one to the other using regression. Typically one starts by assuming a linear relationship but other functional forms (e.g. polynomial) are used if the plot suggests non-linearity. One measure of the quality of fit is the r-squared which is the proportion of the variance in the dependent variable that is explained by the regression model. The r-square ranges in value from 0 (no fit) to 1 (perfect fit).
Now let’s go back to Figure 3. It appears the authors have done the linear regression on the summary data shown in Figure 3a rather than the full set of original data. They quote an r value of 0.83 which corresponds to an r-squared of 0.69. It’s a good time to take another look at Box S3 in the supplementary material. The data from which the summary shown in Figure 3a was generated is distributed between two plots, one for acids and bases and the other for neutrals, quaternary bases and zwitterions. We were a little curious about how the ClogP values were derived for the quaternary bases and why the authors decided to group the charge types as they did. However that is not a path that we wish to go down right now and we’ll not make further mention of these concerns. The plots show that promiscuity will be low when ClogP is very low. However maintaining potency when ClogP is that low is simply not going to be an option for many targets and you’re going to run into permeability problems if you drop ClogP too far. The question we’d like to pose to you, our loyal readers, is whether you’d expect for an r-squared value of 0.69 for either of the two plots in Box S3.
Let’s pause for a moment to review what we’ve learned. Firstly, quote r rather than r-squared because the latter can never exceed the former and your less alert readers may not even notice. Secondly, and more importantly, averaging (in this case taking the median) of one variable over the each of the categories of the other is likely to give you an optimistic view of the strength of the underlying relationship. This is the basis of Categorical Sin and, to help convince you of the fundamental sinfulness of the analyzing data in this manner, consider the situation in which there are only two categories of promiscuity (yes or no). Now suppose the median ClogP values are different for the two categories. What do you expect r-squared to be? Everybody get 1? It really is an honor to write for such clever, cultured readers.
Sadly this is sadly not the only example of Categorical Sin that we have encountered in the peer-reviewed literature (see 1 , 2). Why do the reviewers not pick these things up? It is for journal editors to fret over and it would be grossly unfair to speculate about possible family connections with the unfortunate rogue trader who famously lost his Barings in the city state of Singapore.
Friday, May 16, 2008
Breaking stone in Changi
Sunday, April 20, 2008
A year of The Crapshoot
It is now exactly one year since The Crapshoot made its first appearance. As is customary on these occasions, we would like to thank all our readers, especially those who have commented on posts. We are especially grateful to the authors of articles that have featured in our literature reviews and hope that the occasional less than flattering commentary has not been taken personally.
Saturday, April 19, 2008
Substituents, Potencies and Pinschers
So hopefully the suspense has built from our previous post on the effects of common chemical subsituents on ligand potency. Some of our Loyal Readers will have been annoyed to have been dropped just as it was getting interesting and we can only offer our most abject and grovelling apologies. The suspense had to be built but, much more importantly, we had to go to the pub. They serve a particularly tasty cider there. It’s cloudy, quite strong and, on a bad night, really makes your tongue tingle. If you die from drinking it, it is improbable that your remains will require embalming.
We are sorry that so much time has passed since the previous post. Here’s the link to the target article. We will continue to focus on Table 1.
We have already discussed the categorical sins committed in slicing the tails off distributions to create the F(-1) and F(1) descriptors and you don’t need to be a Doberman Pinscher to appreciate the fundamental immorality of these actions. Nevertheless, slicing distributions is a commonly encountered data-analytic technique in drug discovery research although it tends to be less commonly encountered in statistical textbooks. One recurring concern we have with this data-analytic genre is that the slice points can be chosen to strengthen the conclusion that the investigators would like to draw. Our challenge to the distribution slicers is to demonstrate that the results of their analyses are relatively insensitive to how the slicing is done. Or perhaps consider methods to compare the distributions that adequately account for their continuous nature.
However it is not just the distribution slicing that disturbs our digestion. In order to make the next point we now ask that you ignore the previous paragraph and assume that the binary categorisation by distribution slicing is actually correct. ‘Why do you play these games with our minds?’ we hear you cry and we simply ask that you make the assumption regardless of how absurd it may seem to you (and us) right now. We ask because there are still a number of outstanding issues with the analysis presented in Table 1 of the featured article and it’s just easier to deal with these if you’re not distracted by whether the categorisation is indeed sinful.
In the interests of time we’ll skirt over the arbitrariness of the choice of methyl as the substituent with which distributions for other substituents are compared. We will also skirt over why one would want to compare the effects of substituents with any particular substituent given that these effects have already been defined with respect to hydrogen. If people choose to compare their substituent effects with methyl (or any of the other 52 substituents in Table 1) then it is really not for us to say. The Crapshoot is a liberal, pro-choice sort of column and we believe that Our Loyal Readers are sufficiently mature to take responsibility for those choices that they make.
More serious is the manner in which the casual reader might think that all the distributions that are significantly different from methyl are indeed significantly different from this substituent. A contingency table analysis provides a probability that the observed effect could have been observed by chance alone. The lower the probability, the greater the significance. This is the way of The Statistician. Take another look at Table 1 in the featured article. The entry for F in the eighth column (*) tells us that the fraction of chlorine substitutions (0.064) that lead to an at least 10-fold potency increase and the corresponding figure for methyl (0.053) are significantly different with an associated probability of less than 0.05. This means we are at least 95% sure that the distributions for methyl and chloro are different although it doesn’t mean that we necessarily care. Now let’s suppose we perform two contingency table analyses to compare the effects of substituents X and Y with methyl and get 95% in each case. Does that mean that we are 95% sure that X significantly different from methyl and that Y is significantly different from methyl. Well not exactly. If you want to consider both the substituents, you need to multiply the probabilities (95% x 95% = 90%). If you consider more substituents the problem only gets worse. We hope you’re still with us and apologise profusely for letting things get so tediously technical. We are forced to admit that we’re still no closer to figuring out how, why or whether we should be using the results in Table 1 of the featured article. Please let us know if you are.
Sorry that it all turned into a bit of a slog but we really must move on. You will recall that the data for the analysis has been aggregated across up to 30 assays. The nature of molecular recognition is that sometimes a substituent will increase potency, sometimes it will decrease it and sometimes it will have no effect at all. Medicinal chemists are most interested in the first case where the substituent increases potency. The second situation is still relevant if you can think of a suitable ‘anti-substituent’. For example if you find that putting methyl on an aromatic carbon costs a lot of potency you might try replacing that carbon with nitrogen in case there is a hydrogen bond donor in the binding site whose solvation has been compromised by having a methyl group thrust at it. Probably a bit of long shot (we’re assuming no protein structure is available) but we think it’d still be a better bet than ethyl, butyl or futyl. However when you average over both chemotype and assay, you are unnecessarily adding noise to your signal. In this case, do you really expect to end up with anything other than the unremarkable and underwhelming Table 1?
There is another yet another problem. The dynamic range of assays is limited. If you have a substituent that tends to have a dramatic effect on potency then it will be less likely that you’ll be able to measure potency for both parent and substituted analog. Let’s take a look at the F(-1) and F(1) values for carboxylate that are given in Table 1. The value of F(1) value is 0.247 which tells us that a quarter of the time adding a carboxylate to an aromatic ring leads to at least a log unit drop in potency. This is not surprising given that a carboxylate is not a gift that you really want to offer to a protein unless you’re sure that it will be properly appreciated. The value of F(-1) is 0.056 which is very similar to the corresponding figure (0.053) for methyl. Now let’s assume that we have a situation in which the carboxylate is an essential part of the pharmacophore. The question you really need to ask yourselves is how confident are you that you can measure potencies for both the parent compound and the analog when your substituent is carboxylate. The next question is, knowing what you do about molecular recognition, would you be more or less optimistic about being able to measure the effect of methyl substitution on potency?
Now it’s time to get back to the tails. These are the probably the most interesting regions within the distributions because they provide information about the best (and worst) we can expect to achieve by making a substitution. Unfortunately the authors decided to trim the distributions prior to data analysis by removing potency changes that exceeded 4 standard deviations. Think of all the poorly understood molecular recognition phenomena (topographically-focussed hydrophobic enclosure, electrostatically-enhanced conformational locking, hyperpolarised charge-octupole interactions, hyperconjugation-relayed field gradients) that might be lurking in those discarded tails. What if some of these discarded results could have been interpreted in terms of structure of the target proteins?
So there you have it. The effects of on potency of a number of common substituents and we never even got beyond Table 1. Essential information for drug discovery or philatelic use of the pages of a high impact journal? We are simple folk and we leave it to Our Loyal Readers to decide.
Thankfully, we have only rarely encountered Doberman Pinschers. Our limited experience of this unsavoury breed suggests that they typically throw the best bits away when tail and Pinscher are separated.
Monday, April 7, 2008
Performance metrics for substituents
The essence of molecular design is being able to predict what effects structural modifications will have on the molecular properties that you’re interested in. It’s obviously great if you can actually predict the properties themselves but predicting changes in properties may be easier and you’ve always got the option to perform some measurements on the starting points for the optimisation process. During our recent entanglement with hydrogen bonds, an article with a promising title appeared in a reasonably well-known journal with an impact factor fully worthy of the attentions of The Crapshoot. Eagerly we read on, pausing briefly to ponder the relevance of reference 2 that the authors had apparently selected at random for citation.
When you’re quantifying the effects of structural changes on properties you first need to define the changes. For aromatic rings, hydrogen is the obvious reference substituent. So if you want to find out what a methyl group does for potency then you need to find all the molecular pairs in your potency database in which one molecule has a methyl and the other is identical except that hydrogen replaces methyl. This is pretty much the approach of the authors of our featured article. Let’s take a look at their results for acyclic substituents on aromatic rings which you’ll find in Table 1 of the article.
Potency is the focus of the article and it’s common to use pIC50 (-log of IC50) when comparing potencies. The effect of a methyl group on potency is given by:
pIC50 (X=Methyl) – pIC50 (X=Hydrogen)
Once you’ve identified a number of these pairs, you can do all the normal statistical stuff like calculating means and standard deviations and that’s what the authors did. They also aggregated the results for a number of different assays so the effects of structural changes are averaged over up to 30 different assays. Hope you’re all still with us! Now let’s take a look at what they found.
The mean effect on potency of ranged from -0.261 to 0.498 .and the standard deviation ranged from 0.518 to 1.186. This is exactly the sort of result that you’d expect because the averaging has been performed over multiple assays and multiple chemotypes. We looked at the means and standard deviations in Table 1 and wondered how we might exploit them in molecular design. We are still wondering but of course we are but simple folk.
The authors of the featured study must have been thinking along similar lines. The collection of means and standard deviations is of a distinctly philatelic aspect. Not really the sort of thing that you can present to a journal of high impact factor as being at the cutting edge. What is one to do in situations like this? The answer is to present more statistics and that’s exactly what the authors did. And here’s where it gets complicated so please pay close attention as we try to guide you through the minefield.
They defined two descriptors for the distribution associated with each substitution. F(-1) is the probability of increasing potency by one log unit and F(1) is the probability of decreasing potency by one log unit. The sign convention reflects the authors’ use of logIC50 rather than pIC50 but this is really not a problem. Each of these descriptors partitions each data set into two groups thus providing access to that most famous last refuge of the scoundrel: the contingency table.
Contingency tables are normally used to analyse categorical data. For example, you have some dead smokers and some equally dead non-smokers who have died of lung cancer or something else equally deadly. Analysis of the contingency table tells you whether more smokers than non-smokers have died of lung cancer and how significant it is. Significance of course is not especially significant for these smokers and non-smokers because they’re all dead so it’s probably a good time to refer you to a couple of our earlier posts (1 | 2) on categorical sins.
Now back to the substituents. The authors decided that methyl would be a good reference substituent and did contingency analysis for each substituent with respect to methyl. This is how it works for the F(-1) descriptor and fluoro:
Category 1: Methyl or fluoro
Category 2: Change in logIC50 <= -1 or change in logIC50 > -1
We’re happy with methyl or fluoro as a category just as were happy with smoking/not smoking and lung cancer/other cause of death as categories. We are rather less happy about slicing up a continuous distribution as a way to define categories. We also worry that the choice of methyl as a reference substituent is a little arbitrary when you’ve got an entire deck of cards to choose from.
We will elaborate on these and (a number of) other concerns in the next post. We think it's great fun to split the material up like this because it helps build the suspense. Until then we offer our most effusive and unctuous thanks to all our readers, especially in the state of Illinois, for reading The Great Molecular Crapshoot.
Sunday, March 30, 2008
Blogging the literature
As our Loyal Readers will be aware, we devote a significant proportion of Crapshoot posts to placing peer-reviewed literature in the cross-hairs. Some have said that literature posts are difficult and, on the evidence of our less than prolific output and the turgidity of the resulting posts, we would have to agree.
There are a number of approaches to doing literature posts and these vary a great deal in the demands made of the blogger. The trivial literature post involves a link to an article with no more comment than 'here is a good paper'. The next level up is to summarise the article without adding any original insight or to bring some related articles together. Depending on expertise and experience, these posts can be put together fairly quickly. The two (1 | 2) appearances of the Red/Blue teams will give you an idea of what we're talking about here
However if you want to present a serious challenge to a published article, you'll need to put in some time. Remember that it'll have got past two or three reviewers even though this will not always be apparent. Most of the literature posts in The Crapshoot attempt to identify weaknesses in published articles and we are particularly motivated by a high journal impact factor and a large number of citations. Our commentary on the proposed link between rotatable bonds and oral bioavailability is a good example of this type of post.
Much more difficult than identifying weaknesses in published literature is building on previous ideas and demonstrating their relevance in a different context. We would dearly love to do this in every single literature post and if you can do this consistently you really should be writing for a review journal. The closest we believe we got to achieving this was in the rule of 5 and molecules for simpletons posts. But even there we fell well short of the ideal.
So why do we bother with literature posts? Some find this activity helps them gain a better understanding of the literature. But this is not our motivation. We post because much in the literature is accepted within Pharma as absolute fact. Sheep or lemmings? We leave it to you to decide.
But enough of these musings because we have much bigger fish to fry.
Wednesday, March 26, 2008
Blogroll purge
It's Night Of The Long Knives at The Crapshoot as we purge The Blogroll. Most of the purgees have not posted for a while and many of our loyal Readers will scarcely notice that we have done anything at all.
However we have also decided to de-Blogroll the Sceptical Chymist. Spare a thought for the folk in publishing who put out blogs like these. First, somebody else is likely to have decided that you're going to blog. Then you can't say anything negative about anything that gets written in any of your journals because it would imply that one of your colleagues wasn't doing his/her job properly. Next you can't say anything negative about anything that gets written in somebody else's journals because two (or more!) can play the game of escalatio and having gotten into a spat with your opposite numbers at a journal with lower impact factor is not going to help at annual review time.
We accept it's going to be difficult for a publisher blog to be as sceptical as Robert Boyle. We have done our best to help and on one occasion triggered this interesting exchange. We did not expect any thanks for injecting some fizz into an anodyne literature posting and this was just as well because we didn't get any. Much more deserving of thanks was one of the authors of the featured literature who took the time to respond to our anonymous comments. We would like to see more of this sort of thing and perhaps if journals provided good facilities to comment on published articles we might do so in a less anonymous manner.
Sunday, March 9, 2008
Neutral-neutral hydrogen bonds: The verdict
Well we do seem to have let things slip a bit. Truth be told, we’re getting a little sick of hydrogen bonds and are longing to get back to important things like philately, druglikeness and lean six sigma. However we do take our responsibilities to our loyal readers seriously and will finish that with which we have tasked ourselves. For those of you who’ve only just joined us, this is the conclusion of a series (1 | 2 | 3 | 4 | 5) of Crapshoots that has focussed on the assertion that a neutral-neutral hydrogen bond will contribute no more 1.5kcal/mol to the stability of a protein ligand complex. What are the precise origins of this figure and how did it come to be asserted quite so confidently? These questions are actually as much about the functioning of the scientific establishment as they are about hydrogen bonding and this is the real reason for our interest in this work.
The figure of 1.5kcal/mol made its debut in 1985. It was based on 3 neutral-neutral hydrogen bonds, all of which involve a hydroxyl group either on ligand or protein (see Crapshoot). We were unconvinced that the contribution of any of these hydrogen bonds represented the maximum that we might expect for a neutral-neutral hydrogen bond. Hydroxyl groups are one reason and the large number of hydrogen bonds between protein and ligand is another. Now take a look at the last section of the article entitled ‘Biological specificity’. Are the authors just writing about their protein ligand complex or do they claim a more general scope for their findings? They refer to ‘the enzyme’ but ‘a substrate’ so we have to admit we just don’t know. We can’t believe that anyone would seriously claim to have found the upper limit in a sample of just three so we’ll assume that it’s just a matter of wording.
A year later the figure of 1.5kcal/mol cropped up again. This time the hydrogen bonding groups were deleted from ligand rather than the protein and coincidentally hydroxyl groups were involved in all cases (see Crapshoot). The authors suggested that their results were consistent with the reported figure of 1.5kcal/mol and we do not believe than they claimed generality for their findings. Once again we would not expect the contribution of any of these hydrogen bonds to be at the upper limit for a neutral-neutral hydrogen bond because there are too many of them (see molecular complexity argument)
Then the amide-amide hydrogen bond study appeared in 1993 (see Crapshoot). The amides have a similar problem to the hydroxyls in that deploying NH as a donor may compromise the solvation of the carbonyl oxygen acceptor. We identified a couple of issues with the analysis and do not believe that these measurements support the adoption of 1.5kcal/mol as an upper limit for the contribution of a neutral-neutral hydrogen bond.
While the articles featured represent valuable contributions to the literature, the number of hydrogen bonds sampled is low and the variety narrow. Far too low and narrow for us to be confident that the maximally contributing hydrogen bond was represented in the sample. Yet the assertion was made that 1.5kcal/mol was as good as it would get. Why was this? Did the makers of that assertion think that the results of the primary literature were of broader scope than they actually were? Did they consider how representative these hydrogen bonds were for drug-protein complexes? Did 1.5kcal/mol sit more comfortably in their review than say 2.5kcal/mol? Did they get spooked by the impact factor of the journal in which the figure of 1.5kcal/mol first appeared?
Lots of unanswered questions but here’s an interesting thought experiment. Imagine you have a protein and you convert an amide NH into an ester oxygen. Suppose that the amide NH functions as a hydrogen bond donor and you believe that a hydrogen bond contributes no more than 1.5kcal/mol. Then ester should be no more than 1.5kcal/mol less stable than the wild type protein. Wait, we hear you cry, the ester carbonyl oxygen will be a weaker acceptor than amide oxygen. Fair point, we respond, so we’ll let you set the hydrogen bond involving the amide oxygen to the maximum of 1.5kcal/mol and that involving the ester oxygen to zero which is actually a very big concession. So hopefully you’ll agree that converting an amide in a protein to an ester is not going to destabilise the protein by more than 3kcal/mol.
Now, as you’ll have guessed, this isn’t just a thought experiment. Folk have actually mutated amides into esters and measured effects on protein stability (see Table 2). Now a quick scan of the relevant table will show that this mutation reduces protein stability by over 3kcal/mol in a number of cases, the largest figure being 4.8kcal/mol. Even the most innumerate will concede this is a little larger than 3kcal/mol. There is of course a small detail that we haven’t mentioned and if nobody comments on it we’ll leave it at that.
This concludes our long and tortured look at neutral-neutral hydrogen bonds. We have traced the figure of 1.5kcal/mol from debut in 1985 to being quoted as an upper limit for all neutral-neutral hydrogen bonds. We hope that you will now take a closer look at what lies beneath when numbers like these get presented as facts.