<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-8876332030448981936</id><updated>2012-02-16T17:10:59.695-08:00</updated><category term='pictures'/><category term='pfizer'/><category term='red team'/><category term='molecular descriptors'/><category term='gossip'/><category term='jmc'/><category term='sacred cows'/><category term='gsk'/><category term='systems biology'/><category term='tutorial'/><category term='qsar'/><category term='astex'/><category term='animal models'/><category term='oral drugs'/><category term='house keeping'/><category term='vertex'/><category term='rule of 2'/><category term='nrdd'/><category term='validation'/><category term='privileged substructure'/><category term='ddt'/><category term='jcamd'/><category term='fragment screening'/><category term='rule of 3'/><category term='travel'/><category term='amusing or bizarre'/><category term='rule of 5'/><category term='opinion'/><category term='stamp collecting'/><category term='data analysis'/><category term='predictive modelling'/><category term='enrichment'/><category term='metric'/><category term='privileged fragment'/><category term='organisational'/><category term='pharmacokinetics'/><category term='az'/><category term='hydrogen bonding'/><category term='pharma life'/><category term='blue team'/><category term='latent indicator variable'/><category term='molecular recognition'/><category term='update'/><category term='categorical sin'/><category term='synthesis'/><category term='literature reviews'/><title type='text'>The Great Molecular Crapshoot</title><subtitle type='html'>Here we take a look at some of the art, science and dogma associated with Drug Discovery.  Comments are very welcome but keep them clean because this is a family-friendly blog.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>76</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5216893273382336264</id><published>2012-02-09T01:42:00.000-08:00</published><updated>2012-02-09T15:54:49.139-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oral drugs'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='stamp collecting'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='gsk'/><category scheme='http://www.blogger.com/atom/ns#' term='ddt'/><category scheme='http://www.blogger.com/atom/ns#' term='molecular descriptors'/><title type='text'>By the pricking of my thumbs, something aromatic this way comes...</title><content type='html'>So we accepted an &lt;a href="http://gmc2007.blogspot.com.au/2012/01/solubility-forcast-index-awarded-57-for.html" target="_window"&gt;invitation to get physical in drug discovery&lt;/a&gt; although we are not sure that this was such a good idea.     However, before we can return to the &lt;a href="http://dx.doi.org/10.1016/j.drudis.2010.05.016" target="_window"&gt;Solubility Forecast Index &lt;/a&gt;(SFI) we first need to look at an &lt;a href="http://dx.doi.org/10.1016/j.drudis.2009.07.014" target="_window"&gt;earlier study &lt;/a&gt;by colleagues of SFI’s creators to get a feel for the origins for the focus on aromatic rings.   Following the relevant citation in ‘getting physical’ we aimed to find out whether too many aromatic rings are a liability in drug design.   Why don’t you join us as we take a tour through the article.&lt;br /&gt;   &lt;br /&gt;One of the first things that you’ll notice is that one of the authors is described as ‘a medicinal chemistry design expert’ but don’t let this intimidate you because  you’ve already learned &lt;a href="http://gmc2007.blogspot.com.au/2010/09/experts-and-how-to-avoid-them.html" target="_window"&gt;how to deal with experts&lt;/a&gt;.  The authors state in the abstract:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;“On the basis of this analysis, it was concluded that the fewer aromatic rings contained in an oral drug candidate, the more developable that candidate is probably to be; in addition, more than three aromatic rings in a molecule correlates with poorer compound developability and, thus, an increased risk of attrition in development.”&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;It’s going to be interesting to see how they quantify ‘developable’ so let’s read on!&lt;br /&gt;&lt;br /&gt;We’d like to start by looking at Figure 2 which shows a box plot of measured solubility and aromatic ring count and you might want to check out Figure 1 if you’re not clear about what all the funky graphics mean.  We certainly agree that solubility decreases as the number of aromatic rings increases although it is difficult with a plot like this to tell exactly how strong the trend is.  When presenting data using a graphic like this, it’s a good idea to arrange things so that the variance in each box is as constant as possible and plotting solubility logarithmically is likely to help in this regard.   Transforming the data in this manner will also help with the skewness of the distributions in the boxes such as the one corresponding to 5 aromatic rings for which the mean is actually greater than Q3.   However, we don’t  want to dwell on this, or the analysis of albumin binding in Figure 7, too much because it is clear from a &lt;a href="http://dx.doi.org/10.1016/j.drudis.2010.11.014" target="_window"&gt;subsequent publication &lt;/a&gt;that these authors have already seen the error of their ways on this particular issue and we really want to move on.&lt;br /&gt;&lt;br /&gt;Let’s take a quick look at Figure 3 which shows (yet another) another box plot and six pie charts that get redder (and less green) as the number of aromatic rings increases.  The green bits represent the proportion of compounds with ClogP &amp;lt; 3 and the red bits the proportion of compounds with ClogP &amp;gt; 3 so the pie charts show what the box plot shows which is that ClogP increases with the number of aromatic rings.  The authors assert an excellent correlation between lipophilicity and aromatic ring count although their reluctance to quantify this with a correlation coefficient should set some alarm bells ringing.  They also note that &lt;em&gt;“the addition of an aromatic ring usually results in a discrete and statistically significant jump in c log P”&lt;/em&gt; which certainly confused us since we are unaware of what makes some jumps discrete and others otherwise.  We recalled the “&lt;em&gt;clearer stepped differentiation within the bands&lt;/em&gt;” from SFI and wondered whether anybody where these chaps work ever uses correlation coefficients.  A picture may indeed be worth a thousand words but surely one correlation coefficient is worth more than six pie charts.&lt;br /&gt;&lt;br /&gt;So let’s move on.  We’d like you to take a look at Figure 5.  Looks familiar, doesn’t it?  Well &lt;a href="http://gmc2007.blogspot.com.au/2008/05/breaking-stone-in-changi.html" target="_window"&gt;here’s where you’ve seen it before&lt;/a&gt;.  Don’t correlations look so much better when you’ve hidden the variation!   As loyal and cultured readers of this column, you will deafened by the cacophony of warning bells whenever you see data presented in this manner.&lt;br /&gt;&lt;br /&gt;However, there is a more serious problem with all of this.  The number of aromatic rings is a measure of molecular size and, for the compounds in pharmaceutical databases, it is likely to be correlated with other measures of molecular size such as molecular weight, number of non-hydrogen atoms, molecular volume and molecular surface area.  Now you can see the problem.  If the authors had selected one of these other properties and found a correlation, we would be discussing an article entitled, “The impact of molecular volume on compound developability...” and the Great Molecular Crapshoot would be haranguing them for not checking for the influence of aromatic ring count.  If you’re going to assert that aromatic ring count is somehow special then you’re effectively saying that if we control the number of aromatic rings we can do whatever we want with molecular weight.  There are things that one can do to investigate whether aromatic ring count is doing more than just contributing to molecular size but these would require a data-analytic capability beyond that which the authors have demonstrated here.  It would all be less of a problem if the correlations between aromatic ring count and properties like solubility were very strong.  However, if the correlations were indeed strong, we suspect that the authors would have been quoting some numbers rather than waffling about discrete and statistically significant jumps. &lt;br /&gt;&lt;br /&gt;So where does this analysis lead?  They authors suggest the following mnemonic for oral drug discovery programs:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;“The fewer the number of aromatic rings contained in an oral drug candidate, the more developable that candidate is likely to be; specifically, more than three aromatic rings in a molecule correlates with poorer compound developability and, therefore, an increased risk of compound attrition.”&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Reading this, we couldn’t help thinking that the tortured grammatical construction of this mnemonic appeared to be somewhat at odds with its being described as mnemonic.   We asked ourselves how we might use this mnemonic in real life Drug Discovery.  Should we still worry about tiresome details like lipophilicity and molecular weight if the compound has three or less aromatic rings?  What should we do when the compound with three aromatic rings is actually less soluble than the one with four?  Is the mnemonic relevant with target-related attrition?  How is developability defined and how does it depend at all on the ability of the compound to hit the target?  What were the reviewers of this manuscript smoking when they let it through?&lt;br /&gt; &lt;br /&gt;Those of you still reading this piece are probably thinking that we’re being a bit harsh with all this criticism.  Couldn’t you be a bit more constructive, we hear you cry and, Loyal Readers, we concede that you may have a point.  We think this whole developability business needs to be more physical.  In other words we need more equations and more physics and so we have devised a new descriptor to do precisely that.  We propose that we use a count of the number of neutrons (which we propose calling N&lt;sub&gt;n&lt;/sub&gt;) in a molecule as a measure of its developability and eagerly await a run on the lighter isotopes as the Pharma companies dig themselves into their patent bunkers.  What could be more physical than neutrons?&lt;br /&gt;&lt;br /&gt;That’s where we’ll leave it for now but don’t go too far because we will soon be returning to the Solubility Forecast Index...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5216893273382336264?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5216893273382336264/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5216893273382336264' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5216893273382336264'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5216893273382336264'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2012/02/by-pricking-of-my-thumbs-something.html' title='By the pricking of my thumbs, something aromatic this way comes...'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5325065003575623522</id><published>2012-01-27T05:19:00.000-08:00</published><updated>2012-01-27T20:29:02.848-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='categorical sin'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='stamp collecting'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='gsk'/><category scheme='http://www.blogger.com/atom/ns#' term='ddt'/><title type='text'>Solubility forecast index awarded 5.7 for artistic expression...</title><content type='html'>We hope that you enjoyed the recent &lt;a href="http://gmc2007.blogspot.com/2012/01/lipophilicity-primer.html" target="_window"&gt;Primer on Lipophilicity&lt;/a&gt; and found reading it to be edifying and and educational.  Although it really is an honour to write pieces like that one for such clever, cultured readers, we do need  to return to the style with which our loyal readers associate us.  The &lt;a href="http://dx.doi.org/10.1016/j.drudis.2010.05.016" target="_window"&gt;article&lt;/a&gt; that today blunders into the cross hairs bills itself as a contemporary perspective on solubility and hydrophobicity although we wonder if its authors truly get physical in drug discovery.  The article, as you might guess from the title, explores relationships between solubility and logP or logD and it is instructive to read what the authors have to say about their data-analytic philosophy:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;“Data plots with lines of best fit and unity gave a representation of the data, albeit with a statistical analysis, which did not adequately convey the distribution of data because of the large numbers. The distribution of values was better conveyed through normalized bar graphs and box plots using binned hydrophobicity and/or solubility values, which better represent the distribution of data in a more visually amenable manner.”&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;To paraphrase:  We couldn’t find what we wanted to when we analysed the data so we drew some pictures instead.&lt;br /&gt;&lt;br /&gt;OK, this assessment may seem harsh and we do admit that plotting data is certainly a good thing, especially as a precursor to analysis.  However, we have shown you previously that weak trends can be made to look a whole heap stronger by &lt;a href="http://gmc2007.blogspot.com/2008/05/breaking-stone-in-changi.html" target="_window"&gt;hiding&lt;/a&gt; or &lt;a href="http://gmc2007.blogspot.com/2008/07/desperately-seeking-signifcance.html" target="_window"&gt;masking&lt;/a&gt; variation and when you plot data enough you can end up &lt;a href="http://gmc2007.blogspot.com/2008/08/pope-atheist-and-irishman-called-dave.html" target="_window"&gt;seeing what you think should be there&lt;/a&gt;.  Also, if you’ve got enough data then even the weakest trend becomes significant and we respectfully draw the attention of our readers to the tale (as opposed to the tail) of the &lt;a href="http://gmc2007.blogspot.com/2008/09/it-really-is-crapshoot-conclusion.html" target="_window"&gt;55% coin&lt;/a&gt;.  When presenting trends, it’s really important to remember a trend’s strength is even more important than its mere existence.&lt;br /&gt;  &lt;br /&gt;So let’s get back to business and we’d like you to take a look at Figures 6a and 6b which illustrate the relationships between aqueous solubility and two different calculated lipophilicities, namely  logP and logD&lt;sub&gt;pH7.4&lt;/sub&gt; that have been predicted using the ACD software.   Solubility is ‘quantified’ as a series of bars that indicate the relative proportions of compounds in poor, intermediate and good categories.  So hopefully, you’re still with us but please speak up if not.   The lipophilicity values have been ordered into bins and as regular readers of the Crapshoot you’ll be wondering why they just don’t plot the data instead of putting it into all these bins.  Now when you look at Figures 6a and 6b you might be thinking that the data is evenly distributed across the bins but if you look at the fine print on top of each bar, you’ll see this is most definitely not the case.   Furthermore when you compare these numbers for corresponding bins in the two plots you’ll see that the distribution of the data across bins differs in the two plots.  Not that you’d guess that from just looking at the plots and it does make meaningful visual comparison of the plots difficult.&lt;br /&gt;&lt;br /&gt;So the authors would have us believe that ACD logD&lt;sub&gt;pH7.4&lt;/sub&gt; is a more effective than ACD clogP as a predictor of aqueous solubility.  Let’s take a look at how they do this.  Basically the ‘analysis’ consists of looking at the bar charts in Figure’s 6a and 6b and stating:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;“The clearer stepped differentiation within the bands is apparent when log D&lt;sub&gt;pH7.4&lt;/sub&gt; rather than log P is used, which reflects the conisderable [sic] contribution of ionization to solubility.”&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;In other words, a beauty contest for charts.&lt;br /&gt;&lt;br /&gt;However, we’re not quite done yet because we still need to take a look at the Solubility Forecast Index (SFI) although we have nasty feeling that we’re not going to like it when we do.   SFI is defined as the sum of clogD&lt;sub&gt;pH7.4&lt;/sub&gt; and the number of aromatic rings (#Ar) and the equivalent bar chart to Figures 6a and 6b is shown in Figure 9.   We are going to take a much, much closer look at SFI in another Crapshoot but for now let’s just see what the authors have to say about the bar charts:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;“This graded bar graph (Figure 9) can be compared with that shown in Figure 6b to show an increase in resolution when considering binned SFI versus binned c log D&lt;sub&gt;pH7.4&lt;/sub&gt; alone.”&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;So I guess you’re all wondering what the difference is between “clearer stepped differentiation within the bands” and “an increase in resolution”.   Please let us know if you do find out because we’d love to know as well.  We’d also like to know exactly how the authors define resolution because to speak of an increase in resolution is to make a quantitative statement.We really don’t have any answer to this question so, as an instructive excercise, we suggest that our readers might attempt to describe the relationship between Figure 6a and Figure 9.  Bonus points will be awarded for answers presented in Limerick format.&lt;br /&gt; &lt;br /&gt;Of course the raison d'être of the Crapshoot is not just to seek the funny side of Drug Discovery and we also like to provide practical advice that will be seen as helpful and constructive.  We advise the authors to seek the opinion of a professional statistician as to whether beauty contests for bar charts constitute a valid method for asserting that one parameter provides  a quantitatively better description than another of solubility (or indeed any other property of interest).   We also believe that editors of journals greatly value feedback from those who occasionally read those journals and so we offer the following advice.  Find out who reviewed the manuscript for you and make sure that they don't do any more.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5325065003575623522?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5325065003575623522/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5325065003575623522' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5325065003575623522'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5325065003575623522'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2012/01/solubility-forcast-index-awarded-57-for.html' title='Solubility forecast index awarded 5.7 for artistic expression...'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5223122026377238276</id><published>2012-01-19T03:05:00.000-08:00</published><updated>2012-01-27T05:19:19.194-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='tutorial'/><category scheme='http://www.blogger.com/atom/ns#' term='molecular descriptors'/><title type='text'>A primer on lipophilicity</title><content type='html'>Well it has been a while since we last posted and it is not too far into 2012 for us to wish you a very happy new year that is free of categorical sin.   In the next post, we’re going to take a look at relationships between aqueous solubility and lipophilicity so we thought that a lipophilicity primer would represent a noble public service.  The most important  quantity when discussing lipophilicity is the partition coefficient, P, which for a compound that lacks ionisable groups can be measured by shaking the compound with water and an organic solvent (usually 1-octanol ) that does not mix with it and we’ll only be discussing measurements in the octanol/water system in this post.  When you’ve shaken everything long enough for compound to equilibrate properly between the solvents, you can stop shaking and wait for the octanol and water to separate into two layers.   Once this has happened, we just need to measure the concentration of the compound in each solvent and calculate logP (we generally use the logarithm of P rather than P itself) using equation 1.  The measurement of partition coefficients is pretty routine these days and there is even a piece of kit called a shake flask which means that you can go for well-earned coffee rather than having to shake a separation funnel.  However, please remember that you can only use equation 1 to calculate logP for compounds that are predominantly neutral at the pH of the experiment.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-W5vvucp50x4/Txf6AHTdRpI/AAAAAAAAACw/WKfwQQdfWh8/s1600/eqn1.jpg"&gt;&lt;img style="WIDTH: 400px; HEIGHT: 41px; CURSOR: hand" id="BLOGGER_PHOTO_ID_5699298733328254610" border="0" alt="" src="http://3.bp.blogspot.com/-W5vvucp50x4/Txf6AHTdRpI/AAAAAAAAACw/WKfwQQdfWh8/s400/eqn1.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;When the compound has ionisable groups, the situation gets a bit more complicated.  But only a bit, so please don’t worry because we’ve been through situations like these before together.  First thing to remember is that when we measure lipophilicity of a compound we’re actually measuring something called the distribution coefficient, D.  This quantity D is just like P except that we use the total concentration of both neutral and ionised forms of the compound in each solvent to determine D and the general situation can get rather messy.  When we measure lipophilicity using octanol/water partitioning, we normally assume that only the neutral form of the compound goes into the octanol and that ionised forms of the compound will stay in the water.  This assumption can break down but we’re not going to worry about that right now because it probably (hopefully?) doesn’t happen too much.    The situation that we most frequently encounter when thinking about logD values is one in which the compound has a single ionisable group.  In this case, we can calculate logD from measured concentrations using equation 2 which we’ve re-arranged to equation 3.  &lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-axutJU_gIDc/Txf5Oa8gY0I/AAAAAAAAACk/nvasC0XZ6iU/s1600/eqn23.jpg"&gt;&lt;img style="WIDTH: 400px; HEIGHT: 141px; CURSOR: hand" id="BLOGGER_PHOTO_ID_5699297879607239490" border="0" alt="" src="http://3.bp.blogspot.com/-axutJU_gIDc/Txf5Oa8gY0I/AAAAAAAAACk/nvasC0XZ6iU/s400/eqn23.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Let’s take a look at what equation 3 tells us.  Firstly, when there is no ionisation, the fraction of compound in its neutral from is 1 so logP and logD are identical.  Secondly, the fraction of compound in its neutral form cannot exceed 1 so its logarithm cannot exceed zero and logD cannot exceed logP.  Thirdly, equation 3 tells us how to obtain logP from measured logD values.   One approach is to measure logD at a pH at which the proportion of the compound in the ionised form is insignificant. For example, we could measure logD for a carboxylic acid at low pH (you’ll know if that it’s low enough if logD stops increasing as you lower pH).   The alternative is to use pK&lt;sub&gt;a&lt;/sub&gt; to calculate the fraction of compound in the neutral form and use equation 3 to calculate logP.  Suppose we measure a logD value of 2.0 at a pH of 7.4 for an amine with a pK&lt;sub&gt;a&lt;/sub&gt; of 10.4.  In the buffer, the amine is 99.9% ionised and 0.1% neutral so the logP for this amine is 5.0.&lt;br /&gt;&lt;br /&gt;An obvious question is which of logP and logD is more relevant to Drug Discovery and to be quite honest we’re not sure.  The snap answer is that it depends on context.  Both logP and logD are measures of how strongly the molecules of a compound interact with water and high values reflect weak interactions with water and a tendency for those molecules to head elsewhere should they find themselves in water.  'Elsewhere' can be any of a number of places including the hydrophobic core of a lipid bilayer membrane, inside a crystal lattice or bound to a protein.  As a very rough rule of thumb, logD will be more relevant if the molecules have to 'de-ionise' to go ‘elsewhere’ while logP will be more relevant if the molecules go ‘elsewhere’ in their ionised states.  This all sounds like pseudo-mystical psychobabble, we hear you cry and so we’ll try to make it a bit clearer with something a bit more specific.   Let’s start with aqueous solubility and many of you will remember extracting carboxylic acids from organic solvents by shaking them some aqueous sodium hydroxide.  This works because aqueous solubility of ionisable compounds tends to be limited by the solubility of the neutral form.  Suppose a carboxylic acid with a pK&lt;sub&gt;a&lt;/sub&gt; of 4.4 has a solubility of 1000µM at a pH of 7.4.  The concentration of neutral acid under these conditions will only be 1µM and we can infer that the solubility of the neutral form  of the acid is only 1µM.  Let’s magically increase the pK&lt;sub&gt;a&lt;/sub&gt; of the acid to 5.4 (which the observant amongst you will have observed may be a bit high for a typical carboxylic acid), leaving the solubility of the neutral form unchanged and see what happens.  Our new acid is now only 99% ionised under assay conditions instead of 99.9% ionised which means that the neutral form will start precipitating (supersaturation permitting) once the total concentration of acid gets to 100µM.  Increasing pK&lt;sub&gt;a&lt;/sub&gt; which makes the acid less acidic results in a decrease in the measured solubility .&lt;br /&gt;  &lt;br /&gt;So we hope that this will give you a better idea of what ‘elsewhere’ means in this context.  The next question is how should we use logP and logD as descriptors for analysing data.  What you do depends a bit on what you have available and what sorts of compounds you’re dealing with.  If you have  measured logD values and the compounds lack ionisable groups then logD and logP are identical and these measured values will be more relevant than predicted values (provided of course you have a clear idea of the dynamic range of the logD measurement and quantification/detection limits).  Life gets more complicated if you have to handle compounds with ionisable groups because you’re unlikely to have measured pK&lt;sub&gt;a&lt;/sub&gt; values available for all the compounds that you’re interested in and access to logP will involve a predictive element if logD has only been measured at a single pH.  You might decide that logD is more relevant than logP to your situation in which case you can use logD.  However, when you use logD measured at a pH of 7.4 to model data you need to remember (equation 3) that you’ll be treating an amine with a pK&lt;sub&gt;a&lt;/sub&gt; of 11.4 and a logP of 6.0 as equivalent to an amine with a pK&lt;sub&gt;a&lt;/sub&gt; of 8.4 and a logP of 3.0. &lt;br /&gt; &lt;br /&gt;The situation most frequently encountered when using lipophilicity as a descriptor is the one where both logP and logD are themselves predicted.  Usually logD will be predicted from logP using an estimate for the pK&lt;sub&gt;a&lt;/sub&gt; and, if this is the case, you really need to be asking yourself whether it really makes sense to bundle logP and f&lt;sub&gt;N&lt;/sub&gt; together when they two quantities describe such different phenomena.  If you use logD in a predictive model then that model will respond identically to the same change in logP and logf&lt;sub&gt;N&lt;/sub&gt; and, if you’re really thinking about what you’re doing, you’ll be asking yourself if you really want your model to be doing this.&lt;br /&gt;&lt;br /&gt;It’s getting to the point at which we should be wrapping up.  We’ll leave you with links to a &lt;a href="http://en.wikipedia.org/wiki/Partition_coefficient" target="_window"&gt;wikipedia page&lt;/a&gt; and an &lt;a href="http://dx.doi.org/10.2174/1568026013395100" target="_window"&gt;article&lt;/a&gt; that present some of the material we’ve discussed from a different angle and in more depth and we hope that you find them useful.  In next Crapshoot we’ll returning to the critical review of literature that you’ve come to expect of us and you can expect some getting physical in drug discovery.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5223122026377238276?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5223122026377238276/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5223122026377238276' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5223122026377238276'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5223122026377238276'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2012/01/lipophilicity-primer.html' title='A primer on lipophilicity'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-W5vvucp50x4/Txf6AHTdRpI/AAAAAAAAACw/WKfwQQdfWh8/s72-c/eqn1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-3442880188039873287</id><published>2011-04-20T07:03:00.001-07:00</published><updated>2011-04-20T09:55:56.705-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='systems biology'/><category scheme='http://www.blogger.com/atom/ns#' term='pharmacokinetics'/><title type='text'>The Transit Disagreement</title><content type='html'>Today is a special day since it marks four years since we started this blog. We thank Our Loyal Readers (all three of them) for their continued support. &lt;br /&gt;&lt;br /&gt;Transport is very important. For example, after the Wehrmacht invaded Norway they wanted to reinforce the troops already there. This was not easy because Germany and Norway do not share a border and so they did a deal to allow a few troops to pass through Swedish territory. You should probably think about this transport as facilitated rather than active since the chaps from the Wehrmacht were simply goose-stepping down a concentration gradient rather than being carried in sedan chairs by their Swedish hosts.&lt;br /&gt;&lt;br /&gt;Drug discovery is similar to the Norwegian Problem and we’re not talking about commercial whaling which is also a Japanese problem even if both claim that they do it for ‘scientific’ reasons. You need to equip your troops properly and then get enough of them there to do the job properly. The objective of drug design is to ensure that your creation actually hits its intended target(s) with minimal collateral damage. If you’re designing a drug for oral dosing then getting it into the blood stream is usually a good start because drug targets are usually in or on cells and these cells can’t get too far from the blood otherwise they die. Once you’ve got the drug into the circulation, you may or may not want it to get into cells and through other barriers such as the one that protects the brain, although achieving this degree of control is not trivial. The view from Pharma is that most drugs get to their targets by passive diffusion through cell membranes. However, this view has been &lt;a href="http://dx.doi.org/10.1038/nrd2438" target="_window"&gt;challenged&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;'In this article, we discuss the evidence supporting the idea that rather than being an exception, carrier-mediated and active uptake of drugs may be more common than is usually assumed — including a summary of specific cases in which drugs are known to be taken up into cells via defined carriers — and consider the implications for drug discovery and development.'&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Let’s all think about how a protein can help get a drug into cells. The first way is by increasing the drug’s permeability through the membrane so that the drug can move faster down the concentration gradient. This mechanism, sometimes called &lt;a href="http://en.wikipedia.org/wiki/Facilitated_diffusion" target="_window"&gt;facilitated diffusion&lt;/a&gt;, involves temporary binding of the drug to a protein in the cell membrane and, as is the case for passive diffusion, the drug moves down its concentration gradient. Diffusion whether facilitated or passive is still diffusion and it is hellishly difficult to quantify the relative contributions of the two processes for an arbitrary drug and cell. Facilitated diffusion can be saturated (just like an enzyme) but then again the lipid portion of the membrane may also have a finite capacity for drug.&lt;br /&gt;&lt;br /&gt;Active transport can oppose diffusion and when this happens it may be easier to observe (in principle anyway). An input of energy is required when a drug is coerced up its concentration gradient (hence the term active). Let’s suppose that we have a barrier through which we believe active transport takes place and want to test this hypothesis. The usual approach is to measure the permeability from each side of the barrier and see how different the two measurements are. Another possible approach which doesn’t seem to get mentioned would be to allow the system to come to steady state and measure the concentrations on either side of the barrier.&lt;br /&gt;&lt;br /&gt;Now we’ll take a closer look at the article. The authors state that, &lt;em&gt;‘There is abundant evidence for carrier mediated drug uptake in specific cases where it has been studied’&lt;/em&gt;. However, we were confused by much of the evidence for carrier mediated drug uptake that was presented by the authors. What does it mean for a drug to be a substrate for a transporter? Is it possible to quantify the relative contributions of passive diffusion through the lipid bilayer and carrier-mediated uptake? How predictive are these cell-based assays of the physiological reality of an in &lt;em&gt;vivo situation&lt;/em&gt;? Is this carrier-mediated transport passive or active? Just how many drugs have been evaluated in these studies? &lt;br /&gt;&lt;br /&gt;Another point made in support of the view that&lt;em&gt; ‘rather than being an exception, carrier-mediated and active uptake of drugs may be more common than is usually assumed’&lt;/em&gt; is that &lt;em&gt;‘Drugs can concentrate in specific tissues’&lt;/em&gt;. The demonstrate some awareness of the difficulties of defining a relevant intracellular concentration when they say, &lt;em&gt;‘Binding is probably not the major issue as intracellular concentrations can be significantly larger than any plausible stoicheiometric concentration of binding sites’&lt;/em&gt;. However, they do not seem to be aware that basic compounds can accumulate in the acidic interior of a lysosome and that it is not actually necessary to invoke active transport to provide a rationale for this observation.&lt;br /&gt;&lt;br /&gt;Now we want you to take a look at Figure 3b which shows a plot of Caco-2 cell permeability against logK. Like us, you were probably wondering what K is and were as surprised as us to learn that K is the octanol-water partition coefficient which everybody else calls P. So logK is just our old friend logP although we don’t know if it is a predicted or measured quantity. Our advice to Systems Biologists is use logP rather than logK for this quantity if you want people in Drug Discovery to think that you know what you're talking about. You’ll notice that the correlation between permeability and logK (which is really logP) is not very good and there are a number of plausible explanations for this observation that have nothing to do with carrier-mediated transport. Firstly, if a compound contains ionisable groups there will be less neutral form available for partitioning into the membrane and perhaps you should be looking at logD or at least accounting for the ionisation. Secondly, octanol is not a good model for the membrane core because it has a polar hydroxyl group and gets pretty wet when in contact with water.&lt;br /&gt;&lt;br /&gt;However, there is a third reason why you might not see a great correlation between Caco-2 cell permeability and logP that has nothing to do with logP. You’ll need to think a bit about this but please don’t worry because we’ll be right next you all the time. First, take a look at this &lt;a href="http://www.apredica.com/caco2.php" target="_window"&gt;helpful description of the Caco-2 permeability assay&lt;/a&gt; because it’ll give you an idea how these assays are run. When measuring permeability through a barrier you’ll typically introduce a known amount of compound on what we’ll call the ‘donor’ side of the barrier and then measure compound concentration on both sides (‘donor’ and ‘acceptor’) of the barrier as a function of time. Now consider the situation in which concentration is measured at a single time point. If the permeability is really low, the quantity of compound on the acceptor side of the membrane will be too small to measure and the concentration on the donor side will not differ detectably from the initial concentration. You can also have problems if the compound is too permeable because you can end up with similar concentrations on either side of the barrier, which can lead to significant uncertainty in the measured values of permeability. It also means that there is an upper limit to the permeability that can be measured and you can get an idea what this will be by taking a really close look at the assay protocol. &lt;br /&gt;&lt;br /&gt;There’s other information that you can get from Caco-2 permeability assays. A Caco-2 cell monolayer is polarized, having apical (A) and basolateral (B) faces and intestinal absorption is in the A→B direction. However, you can also measure the permeability in the B→A direction and a significant difference from the A→B permeability is an indication that the compound is actively transported through the monolayer of cells. If, as the authors suggest is frequently the case, compounds require active transport in order to be absorbed from the gut then you’d expect that the A→B permeability will frequently exceed the B→A permeability in Caco-2 (and &lt;a href="http://www.cyprotex.com/cloescreen/in-vitro-permeability/mdr1-mdck-permeability/" target="_window"&gt;MDCK&lt;/a&gt;) cell permeability assays. It is rather curious that the authors do not discuss this point in connection with Figure 3b.&lt;br /&gt;&lt;br /&gt;Our experience with Caco-2 assays is that it is actually more common for the B→A permeability to exceed the A→B permeability. This situation corresponds to efflux where the transporters pump the compound back into the gut and the ratio of B→A to A→B permeability is called the efflux ratio. If a compound can make it to the blood stream without the assistance of active transport then it’s difficult to argue a case for the same compound requiring active transport to get into other cells. In our view, the authors have not presented convincing evidence that supports a view that carrier-mediated and active uptake is particularly common. We were surprised that they did not look at measured ratios of B→A to A→B permeability in Caco-2 monolayers since these would have provided evidence with which they could have tested their hypothesis.&lt;br /&gt;&lt;br /&gt;Despite our perceptions of the weakness of the case presented by the authors, we certainly accept that transporter-mediated uptake of drugs does occur in some cases. Our issue is more with the idea that it is the norm rather than the exception. We were a more than a little surprised to read in the abstract of the &lt;a href="http://dx.doi.org/10.2174/156802609787521616" target="_window"&gt;follow-up article&lt;/a&gt; that:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;‘Drug entry into cells was previously thought to be via diffusion through the lipid bilayer of the cell membrane with the contribution to uptake by transporter proteins being of only marginal importance. Now, however drug uptake is understood to be mainly transporter-mediated’.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Understood? By whom?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-3442880188039873287?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/3442880188039873287/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=3442880188039873287' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3442880188039873287'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3442880188039873287'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2011/04/transit-disagreement.html' title='The Transit Disagreement'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-4293254268587486349</id><published>2011-01-01T03:01:00.000-08:00</published><updated>2011-01-01T03:11:14.671-08:00</updated><title type='text'>Happy New Year</title><content type='html'>Happy New Year from &lt;a href="http://gmc2007.blogspot.com/2009/11/group-manager-to-pharma-fellow-in-one.html" target="_window"&gt;Pharma Fellow&lt;/a&gt;, &lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt;Senior Pharma Fellow&lt;/a&gt;, The &lt;a href="http://gmc2007.blogspot.com/2007/11/cambridge-one-gothenburg-nil.html" target="_window"&gt;Blue Team&lt;/a&gt;, The &lt;a href="http://gmc2007.blogspot.com/2007/11/misadventures-in-reciprocal-space.html" target="_window"&gt;Red Team&lt;/a&gt; and the rest of us here at The Great Molecular Crapshoot.  Please make it your new year resolution to avoid all &lt;a href="http://gmc2007.blogspot.com/2010/06/easing-off-on-categorical-sin.html" target="_window"&gt;Categorical Sin&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-4293254268587486349?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/4293254268587486349/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=4293254268587486349' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4293254268587486349'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4293254268587486349'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2011/01/happy-new-year.html' title='Happy New Year'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5067261302025540523</id><published>2010-12-29T17:39:00.000-08:00</published><updated>2010-12-30T04:51:36.196-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='stamp collecting'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>The wisdom of herds</title><content type='html'>Well it has indeed been a while and we hope that many of you will have enjoyed a restful holiday season.  Back in September we &lt;a href="http://gmc2007.blogspot.com/2010/09/experts-and-how-to-avoid-them.html" target="_window"&gt;wrote about Experts &lt;/a&gt;and were so traumatised by the experience that it is only now that we can return to the theme.  At least this time we promise to avoid all mention of Visionaries because the self-appointed Visionary is simultaneously one of the most pathetic and one of the most irritating people that you will ever encounter during a career in Pharma.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://dx.doi.org/10.1038/nchembio0709-441" target="_window"&gt;article featured in this Crapshoot&lt;/a&gt; is a crowdsourcing evaluation of some chemical probes.  What is crowdsourcing, we hear you cry and what place does it have in a journal that sees itself as associated with serious science?  The honest answer is that we don’t know because the domain of applicability of our own expertise lies well outside the social ‘sciences’.  So let’s go through this together.&lt;br /&gt;&lt;br /&gt;The basis of crowdsourcing is the wisdom of crowds which is defined in Box 2 as:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;‘This concept, popularized by Surowiecki, describes group decision making based on the aggregation of independent, individual decisions, where the average decision is more accurate than any individual decision. The four elements of a wise crowd are independence, diversity of opinion, decentralization and aggregation.’ &lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Previously we have read about the behaviour of crowds of investors.  In discussions about the behaviour of investors we frequently encounter terms such as ‘greed’ and ‘panic’ and rarely (if ever) hear groups of individuals described as ‘rational’ (except by the most theoretical of Economists).  Crowds of investors and Pharma scientists share a herding instinct that can lead to group decision making that is neither rational nor accurate.  Perhaps we should really be talking about herdsourcing.&lt;br /&gt;&lt;br /&gt;So let’s take a closer look at the crowdsourcing study.  A while back the &lt;a href="http://www.nih.gov/" target="_window"&gt;NIH&lt;/a&gt; set up the &lt;a href="http://nihroadmap.nih.gov/molecularlibraries/" target="_window"&gt;Molecular Libraries and Imaging&lt;/a&gt; (MLI) initiative.  The people in the Molecular Libraries Screening Centers Network (MLSCN) did some screening and, among other things, nominated 64 chemical probes.   At this point the crowdsourcers dropped by and we really couldn’t help being reminded of the term &lt;a href="http://en.wikipedia.org/wiki/Seagull_manager" target="_window"&gt;Seagull Manager&lt;/a&gt;.  The crowdsourced group (CSG) are described both as ‘a team of 11 scientists with diverse backgrounds in small molecule discovery’ and ‘well-known experts in preclinical drug discovery’.  The team was invited to express their level of confidence in the probes and we were greatly amused to encounter the term ‘molecular confidence’.  Can you imagine the answers to the question in Cheminformatics 101 asking you to use the term ‘molecular confidence’?    (Brimming with molecular confidence, tetrafluromethane knew with certainty that she’d be able to handle anything that that the cytochrome P450s threw at her).&lt;br /&gt;&lt;br /&gt; The ‘evaluation of the probes was performed on a qualitative ranking of 0 (high confidence, low dubiosity) to 10 (low confidence, high dubiosity)’.  We are not sure that a scale of 0 to 10 can be described as ‘qualitative’.   The ranking scheme may be inaccurate, imprecise and/or irrelevant but we just can’t see how something with eleven levels (we assume that non-integer scores were not allowed) can be described as qualitative rather than quantitative.   It was not clear to us how the individual members of the CSG determined the level of dubiosity and we did not find that the James Joyce quote conveyed any information other than an impression of pretentiousness.    We would have liked to know a bit more about how the CSG group assigned numbers to compounds.  Did they perform detailed analysis or simply gaze expertly at structures?  However, these details are not necessary for what we want to do next which is to take a closer look at the ‘Experts’.&lt;br /&gt;&lt;br /&gt;The featured article could actually have been a very interesting study of ‘Experts’.  Many interesting questions could have been addressed.  Are some CSG members harsher on average than others in their assessment of probes?  Is it possible to cluster the team members on the basis of their scores?  Could the same information have been obtained with a smaller CSG?  Who were the dissenters and who was most likely to regress to the mean? Perhaps a missed opportunity but we won’t dwell on this because we’re itching to get onto what really interests us about this work.  How were the CSG members selected? &lt;br /&gt;&lt;br /&gt;As we have already noted, the authors of the study describe the CSG members as ‘well-known experts in preclinical drug discovery’ and since the CSG members are also authors it is completely understandable that their expertise should be asserted in this manner.  In our view some members of the CSG are not exactly household names and it is not clear that they are any more expert than the MLSCN personnel who nominated the probes in the first place.  If we were assembling a group of Experts we’d want each one to have been corresponding author for some (non-review) articles in the last 3 years or so.  We noticed (see Acknowledgements) that no fewer than six people contributed to the vote of one CSG member and we wondered if all six were Experts while speculating about the magnitude of their contribution.  Another CSG member had only voted on a small minority of the 64 probes.&lt;br /&gt;&lt;br /&gt;The assembly of the CSG raises some other issues that probably shouldn’t be talked about in polite company although we won’t let that inhibit us.   Being described as ‘An Expert’ in a publication like this is beneficial both to the individual concerned and the organisation to which he or she belongs.  As such there is a potential conflict of interest issue that at least needs to be acknowledged.&lt;br /&gt;&lt;br /&gt;Crowdsourcing or herdsourcing?  It is not for us to say for we are simple folk.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;No Experts or ‘Experts’ were harmed in the production of this Crapshoot.&lt;/em&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5067261302025540523?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5067261302025540523/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5067261302025540523' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5067261302025540523'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5067261302025540523'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2010/12/wisdom-of-herds.html' title='The wisdom of herds'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-8819113576049855481</id><published>2010-09-11T03:46:00.000-07:00</published><updated>2010-10-03T05:28:22.523-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='organisational'/><category scheme='http://www.blogger.com/atom/ns#' term='pharma life'/><title type='text'>Experts and how to avoid them</title><content type='html'>Anyone who has worked for any length of time in the Pharmaceutical Industry will have encountered The Expert.  Whether internal or external, The Expert can be defined as somebody who does not need to defend their opinion but merely needs to state it.  We have fond memories of a charming and clever Ghanaian lady from our undergraduate days who pithily summarised An Expert as “somebody white a long way from home”. &lt;br /&gt; &lt;br /&gt;So how do we end up with all these Experts?  Firstly, a desire to be seen as An Expert, or even better, The Expert, is deeply rooted in the psyche of almost every professional scientist although the desire will remain unfulfilled for the vast majority even if they are unaware of this.  This means that if An Expert is required there will be no shortage of volunteers.  As we’ve mentioned before, the senior Pharma management (who prefer to call themselves Leaders, much to our amusement since Leadership implies Direction) just can’t handle multiple scientific opinions and find it much easier to disconnect their central nervous systems and find An Expert.  It is this mental frailty of senior management that creates the intellectual vacuum in which both &lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt;Senior Pharma Fellow&lt;/a&gt; and &lt;a href="http://gmc2007.blogspot.com/2009/11/group-manager-to-pharma-fellow-in-one.html" target="_window"&gt;Pharma Fellow&lt;/a&gt; can thrive.&lt;br /&gt;&lt;br /&gt;The basic problem with an Expert-based scientific culture is that it insulates top-level decision makers from scientific opinion.  We have already discussed the &lt;a href="http://gmc2007.blogspot.com/2007/06/rule-of-5-sociological-fallout.html" target="_window"&gt;emergence of professional Opinion-Havers&lt;/a&gt; in Pharma and how grumpy they get when any of the Great Unwashed dare to have an opinion.  The other problem with professional Opinion-Havers is that they tend to be slow to change their opinions in response to new findings.  This is partly due to innate psychology but it also reflects the dangers of changing one’s opinion.  How can this be, M. le Crapshoot, surely the danger is in not changing one’s opinion in the light of new evidence?  We admit that you may have a point, but the clear and present danger is that if an Opinion-Haver changes his or her opinion too often they risk fatiguing (and therefore ultimately losing the support of) the senior Pharma management who really want the science to be kept in nice, easily-digestible (just like baby food) sound-bites and bullet points.  Suffice it to say the absolute worst thing that could happen to the professional Opinion-Haver is to be returned to the ranks of The Great Unwashed.&lt;br /&gt; &lt;br /&gt;Just like a QSAR model, An Expert has a &lt;a href="http://gmc2007.blogspot.com/2010/06/easing-off-on-categorical-sin.html" target="_window"&gt;domain of applicability&lt;/a&gt;.  Step outside that domain and the smooth rotation of fan blades is likely to be inhibited by the sudden adherence of a brown, viscous material that smells a whole lot like 3-methylindole.  Again like a QSAR model, it’s not always obvious when The Expert has strayed outside his or her domain of applicability.   Now you’ll be able to see the problem.  You’re a senior manager in a large pharmaceutical company and you want to get a technical view on something that’s cropped up.   What should you do?   Don’t worry, just recycle one of the Experts you’ve already got and let somebody else worry about that tiresome domain of applicability stuff.  There’re Experts after all so surely one of them can handle this...&lt;br /&gt;&lt;br /&gt;Being An Expert has its benefits so there’s no shortage of people wanting to join this club.  Some will simply say that they are Experts although this always makes us cringe because we regard stating that one is An Expert is the first step down an extremely slippery slope.   However, it is instructive to see how many people claim to be Experts in their professional networking profiles.  At least this is not as bad as claiming to be Visionary (yes, people really do put that in their profiles as well) which indicates that a prolonged course of strong pharmaceutical intervention is required. &lt;br /&gt; &lt;br /&gt;We’re going to wrap things up there because writing this is just getting too depressing.  On Monday we’re going to spend four hours with a particularly tedious academic who does natural product synthesis.   He’s an old university buddy of our head of department who thinks that he will help us optimise CNS penetration.  However, we spent a substantial portion of the previous meeting with him explaining why we were interested in the binding of our compounds to serum proteins such as albumin and are not optimistic that it’s going to be any better on Monday.  Why can’t he just go and talk to Senior Pharma Fellow...&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Any similarity between the characters in this Crapshoot and persons alive or dead is entirely coincidental. No children, animals, Experts, Pharma Fellows or Senior Pharma Fellows were harmed in the preparation of this Crapshoot.&lt;/em&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-8819113576049855481?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/8819113576049855481/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=8819113576049855481' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8819113576049855481'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8819113576049855481'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2010/09/experts-and-how-to-avoid-them.html' title='Experts and how to avoid them'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-4182032766317771114</id><published>2010-08-25T10:51:00.000-07:00</published><updated>2012-01-27T13:21:45.875-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='metric'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='stamp collecting'/><category scheme='http://www.blogger.com/atom/ns#' term='pfizer'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='jcamd'/><category scheme='http://www.blogger.com/atom/ns#' term='pharmacokinetics'/><title type='text'>Metabolic efficiency</title><content type='html'>So at long last it’s on to &lt;a href="http://dx.doi.org/10.1007/s10822-008-9242-3" target="_window"&gt;PMI&lt;/a&gt;. It really has been ages since we shot the crap and we hope that one or two of our single figure readership might have missed us once or twice while we’ve been away. What is PMI, we hear you cry? Is it something unpleasant that occurs on a monthly basis? Is it sinful and could &lt;a href="http://gmc2007.blogspot.com/2009/11/group-manager-to-pharma-fellow-in-one.html" target="_window"&gt;Pharma Fellow&lt;/a&gt; have helped us avoid the temptation? Patience, Dear Readers, PMI is none of the above. It is a metabolism index which is intended to quantify the effects of specific structural changes on metabolic stability. So let’s take a look at how it works.&lt;br /&gt;&lt;br /&gt;The PMI is defined by making structural pair-wise comparisons of human liver microsome (HLM) stability. But what is a structural pair-wise comparison? This question is best illustrated by an example. Let’s suppose that you want to explore the effect of a chloro-substitution on HLM stability. All you need to do is search the database of HLM stability for pairs of structures which are identical except that one has a chlorine atom and the other has a hydrogen atom replacing the chlorine atom. Then you can pose the question of whether chloro-substitution is ‘good’ for HLM stability by averaging some measure of the change in stability resulting from the substitution over all the pairs. There is of course nothing to stop you looking at other characteristics of the distribution such as variance or even kurtosis (which we don’t understand but believe to be a toe-nail pathology of a most unsavoury aspect). &lt;br /&gt;&lt;br /&gt;The data analysis bit can get a bit sticky and our most loyal Readers may even remember our &lt;a href="http://gmc2007.blogspot.com/2008/04/substituents-potencies-and-pinschers.html" target="_window"&gt;review of an analogous study&lt;/a&gt; a couple of years ago. There are a number of ways you can do this but whatever you do a good starting point is to define k_Cl and k_H as the rates (Clint = intrinsic clearance) of HLM metabolism for the chloro compound and the analogue in which the chlorine atom has been replaced with hydrogen. You can then average k_Cl/k_H or log(k_Cl/k_H) over all the structural pairs to quantify the effect of chloro-substitution on stability. However, the authors of the study did something rather different. First they state that ratios of k_Cl/k_H between 0.5 and 2 are not significantly different from 1 and so they classify these pairs as showing a ‘neutral effect on the particular ADME property’. A ratio above 2 is considered a significant increase and below 0.5 is considered a significant decrease and the PMI is defined as the percentage of pairs with ratio below 0.5 minus the percentage of pairs with ratio above 2. A positive value for a substituent’s PMI indicates that, on average, substitution increases HLM stability relative to hydrogen.&lt;br /&gt;&lt;br /&gt;Regular Readers of The Crapshoot will immediately recognise the possibility of &lt;a href="http://gmc2007.blogspot.com/2010/06/easing-off-on-categorical-sin.html" target="_window" &gt;Categorical Sin&lt;/a&gt; although they’ll not be sure yet whether this can be fixed with a few Hail Marys or whether we need to call in a professional for the deluxe package of confession, chanting and holy water. For those of you who’ve just joined us, the issues are whether it’s OK or not to set the cut offs of 0.5 and 2 for significance. In this scheme a ratio of 1.9 is equally dissimilar to one of 2.1 and one of 5. However, there is another problem because it is not clear exactly what the authors mean by significant. The issue here is how to know when a ratio differs significantly from 1. One way to address the question is to measure the ratio several times and check to see if the average difference from unity is sufficiently large in comparison with the variation. Since we’re looking at ratios we’d probably do this all with logarithms rather than the ratios themselves but you should get the general idea. However you do it, you can’t just look at two measurements in isolation and say that they’re significantly different. The difference between the measurements needs to be large in comparison with the variation in the two measurements and if there’s no variation then you can’t assert significance. It really is that simple. &lt;br /&gt;&lt;br /&gt;The next question is whether it’s going to be Hail-Marys or holy water. To be honest we don’t know the answer ourselves. It all depends on how much out of range data there is. If all the measurements are within the dynamic range of the assay then it’s unlikely that any amount of holy water would be sufficient. We suspect that quite a few of the measurements used to construct Table 1 lay outside the dynamic range of the assay but we just don’t know and think that it would have been polite of the authors to include this information for each substituent.&lt;br /&gt;&lt;br /&gt;We clearly have some gripes with PMI as a predictor of HLM stability but we’ve covered similar ground before and we don’t want to alienate our Loyal Readers by boring them shitless (they should be able get a hold of &lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt;Senior Pharma Fellow’s&lt;/a&gt; last Vision Statement if they want to really wallow in their boredom). Let’s, just for the sake of argument, assume that PMI is a perfectly acceptable measure of HLM stability. “Monsieur le Crapshoot, have you gone soft in the head? Is your brain truly turned to mush?”, we hear you cry and our response is simply, “Please trust us and you will achieve enlightenment”. The reason we are putting our scepticism on hold is because even if PMI smelt of roses there are other aspects of this work that have rather different aromas.&lt;br /&gt;&lt;br /&gt;Let’s get onto MLE which stands for Metabolism-Lipophilicity Efficiency. The basic idea is that the more lipophilic a compound is, the more likely the microsomes are going to ring the dinner bell so why not extract the contribution of lipophicity and appreciate the substituents for their personalities. We ask you now to take a look at Figure 1 which shows a plot of some PMI values against something called delta_cLogP which is the difference between the cLogP value of the compound with the substituent minus the cLogP value of the unsubstituted compound. A positive value of delta_cLogP indicates that the substituent is more lipophilic than the parent compound. Hopefully we’ve not lost anyone yet.&lt;br /&gt;&lt;br /&gt;Let’s take a closer look at Figure 1 in the article. Does anybody smell anything unpleasant? OK we’ll put you out of your misery. The PMI values in Figure 1 have only been plotted for para-substituents. Why have the authors not plotted all the data? Are they worried about overloading their readers with too much data? We suspect that the main reason is that Figure 1 would look very different and the correlation between PMI and delta_cLogP would look a lot weaker.&lt;br /&gt;&lt;br /&gt;At last we’re in a better position to try to understand MLE but first we must define it:&lt;br /&gt;&lt;br /&gt;MLE = PMI + 25 x delta_cLogP&lt;br /&gt;&lt;br /&gt;You may be wondering where the 25 comes from and in fact so were we. ‘The scaling factor of 25-fold is added to give a balanced weighting of the two factors’. What does this mean? How is the precise degree of balance determined? What would be wrong with 21, 30 or 25.2? Is the required degree of balance determined by the PMI values for para-substitution or all the data? Please don’t ask us for we only write The Crapshoot and to be quite honest we’re just as confused as you are.&lt;br /&gt;&lt;br /&gt;Perhaps we should take a closer look at the plot of MLE against delta_cLogP which is Figure 2 in the article. MLE is the sum of two quantities. The first of these quantities is PMI which tends, as we’ve seen in the article’s Figure 1, to get smaller as delta_cLogP gets larger. The second of these quantities is 25 x delta_cLogP which we hope that you’ll see is quite strongly correlated with delta_cLogP. So let’s all put our heads together to try to think of what all this means. Figure 1 is a good place to start and we’ll note that PMI is just MLE with the scaling factor of 25 set to zero. Now consider what happens when you increase that scaling factor is increased from zero. Hopefully you’ll all be able to see that the slope of the line of fit will get less negative before becoming positive. Ultimately you’ll end up with something that looks like Figure 2. “Wait a minute”, we hear you cry, “the data in Figure 2 look very different to the data in Figure 1 so how can you make a statement like that?”. That we don’t deny although the reason that the data look different is because they are different. In Figure 1 they’ve only plotted the data for para-substiuents while in Figure 2 they’ve plotted all the data. Had they plotted all the data in Figure 1 they’d have seen a lot more scatter in the plot. Kind of like what we see in Figure 2.&lt;br /&gt;&lt;br /&gt;We must confess to not being able to understand the authors’ motivation for defining MLE. The trend illustrated in Figure 2 owes as much to a mystery scaling factor that has been plucked out of the ether as it does to effects of substituents on metabolic stability. One might even say that the observed trend is due to the scaling factor and metabolism is responsible for the variation. The authors do state that the plot of MLE against delta_cLogP ‘is useful for highlighting the positional influence of a group’ although presumably the plot of PMI against delta_cLogP would also have done this if the authors had plotted the data for ortho and meta substituents as well.&lt;br /&gt;&lt;br /&gt;So there you have it. PMI and MLE have been introduced to quantify the effects of substituents on metabolic stability. Although we have some issues with PMI we do concede that it does contain some information. However, we are struggling to see how adding 25 x delta_cLogP to PMI increases this information content. MLE is an example of an Efficency Metric and, despite the frequency with which these crop up in the literature, Pharma doesn't seem to be getting any more efficient.&lt;br /&gt;&lt;br /&gt;You may wonder, Loyal Readers, whether we don't have better things to do with our time than point out the essential hollowness of MLE and this is certainly true. However, the hollowness of a metric may not be as apparent to your customer-focused manager who wants you to adopt something that he's just read about in 'Metrics for Dummies'. We see the explosion of new metrics as something akin to the frictional and other forces that prevent Energy being converted to Work with 100% efficiency and have prepared this Crapshoot to provide a shield for our Readers who don't need a circus of metrics to convince them of the truth of The Second Law.&lt;br /&gt;&lt;br /&gt;You'll be relieved to know that we're just about done and all that is needs to be done is ask an important question. Is P happy to be the P in PMI? When the data analysis provides clear insights then being associated with it is clearly a good thing. We suspect that MLE does not fall in to this category. Having P in PMI could be taken as an endorsement, by P, of the the whole package including MLE.&lt;br /&gt;&lt;br /&gt;So there you have it. PMI and MLE. Is P happy to be the P in PMI? Does P even care? Don't ask us, ask P, for we are simple folk and only write The Crapshoot.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-4182032766317771114?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/4182032766317771114/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=4182032766317771114' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4182032766317771114'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4182032766317771114'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2010/08/metabolic-efficiency.html' title='Metabolic efficiency'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-6538796509110783373</id><published>2010-06-03T22:02:00.000-07:00</published><updated>2010-09-26T15:06:27.286-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='categorical sin'/><category scheme='http://www.blogger.com/atom/ns#' term='latent indicator variable'/><category scheme='http://www.blogger.com/atom/ns#' term='qsar'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><title type='text'>Easing off on Categorical Sin</title><content type='html'>It is now time for to move on a bit.  We have been &lt;a href="http://gmc2007.blogspot.com/2010/04/in-loo-of-crapshoot.html" target="_window"&gt;toying with QSAR and predictive modelling &lt;/a&gt;for some time and have long since tired of the sport which can be likened to shooting sedated pheasants, never the intellectual giants of the avian world, at close range.   We have highlighted two errors that are frequently made when building models for biological activity and physicochemical properties.  The first error is a failure to recognise that combinations of descriptors are simply encoding simple substructural features (e.g. carboxyl versus tetrazole; cation versus neutral) and we have termed this the Latent Indicator Variable (LIV).  The second error is inappropriate transformation of continuous data into categorical data and we refer to this as Categorical Sin (CS).&lt;br /&gt;&lt;br /&gt;We believe that the LIV is the lesser of these two evils since it usually represents an honest mistake.  The danger of using LIVs to build your model is that you may actually be extrapolating when you think that you’re interpolating.&lt;br /&gt; &lt;br /&gt;CS is a much more serious error.  The main reason that QSAR modellers transform continuous data into categorical data is to make weak trends appear to be stronger than they actually are.  These errors are frequently anything but honest, which is why we label the underlying behaviour as Sinful, so you need to keep eyes peeled for dirty tricks (e.g. &lt;a href="http://gmc2007.blogspot.com/2008/05/breaking-stone-in-changi.html" target="_window"&gt;hiding variation&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2008/07/desperately-seeking-signifcance.html" target="_window"&gt;plotting standard error in the mean instead of standard deviation&lt;/a&gt;).  There are times when it may be appropriate to transform continuous data, for example to include measured values that lie outside the dynamic range of the assay, but in general you should always be extremely wary of analysis of data which has been transformed in this manner.  You have been warned.&lt;br /&gt;&lt;br /&gt;So there we’ll leave it for now because we’re especially keen to take a look at why your employer might not want you to name your metabolism index after it.  Stay tuned.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-6538796509110783373?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/6538796509110783373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=6538796509110783373' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6538796509110783373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6538796509110783373'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2010/06/easing-off-on-categorical-sin.html' title='Easing off on Categorical Sin'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-8104478962144562038</id><published>2010-04-20T03:56:00.000-07:00</published><updated>2010-09-26T15:06:27.289-07:00</updated><title type='text'>Another year, more Crap to be shot</title><content type='html'>Another year rolls by.  We’re still churning out this garbage and it seems a long time since we embarked on this quest to &lt;a href="http://gmc2007.blogspot.com/2007/04/sacred-cows-make-great-hamburger.html" target="_window"&gt;put sacred cattle to the sword&lt;/a&gt;.  We are gratified that &lt;a href="http://gmc2007.blogspot.com/2007/10/rotatable-bonds-2-sacred-cow-culled.html" target="_window"&gt;Categorical Sin&lt;/a&gt; and the &lt;a href="http://gmc2007.blogspot.com/2009/02/latent-indicator-variable.html" target="_window"&gt;Latent Indicator Variable &lt;/a&gt;form an orthonormal basis that finds practical utility in describing a continuum of data-analytic sins and other deviant behaviour.  Some of your favourite characters have experienced excellent years, demonstrating that a palpable lack of talent should not be seen as an impediment to career advancement.   &lt;a href="http://gmc2007.blogspot.com/2008/11/scaling-scientific-eigernordwand-part-2.html" target="_window"&gt;Group Manager&lt;/a&gt; (now Pharma Fellow) and &lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt;Senior Pharma Fellow&lt;/a&gt; are &lt;a href="http://gmc2007.blogspot.com/2009/11/group-manager-to-pharma-fellow-in-one.html" target="_window"&gt;now respectively referred to&lt;/a&gt; as Key Opinion Leader and Thought Leader, much to the amusement of the janitorial staff.  The only malcontents are the Red and Blue Teams whose appearances have been limited to their two outings (&lt;a href="http://gmc2007.blogspot.com/2007/11/misadventures-in-reciprocal-space.html" target="_window"&gt;1&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2007/11/cambridge-one-gothenburg-nil.html" target="_window"&gt;2&lt;/a&gt;) in 2007 and they are proving quite tiresome in their desperation for some action.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-8104478962144562038?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/8104478962144562038/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=8104478962144562038' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8104478962144562038'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8104478962144562038'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2010/04/another-year-more-crap-to-be-shot.html' title='Another year, more Crap to be shot'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-4488981547735926396</id><published>2010-04-19T04:30:00.000-07:00</published><updated>2010-09-26T15:06:27.290-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='categorical sin'/><category scheme='http://www.blogger.com/atom/ns#' term='latent indicator variable'/><category scheme='http://www.blogger.com/atom/ns#' term='qsar'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><category scheme='http://www.blogger.com/atom/ns#' term='validation'/><title type='text'>In LOO of a Crapshoot</title><content type='html'>Earlier we &lt;a href="http://gmc2007.blogspot.com/2010/04/who-guards-guardians.html" target="_window"&gt;showed&lt;/a&gt; you how our toy model would have passed the LOO test and in fact we could get away with leaving out more than one (Figure 1).  For example you might do your cross-validation by leaving out groups of three or five out these procedures might be called leave three out (L3O)or leave five out (L5O).    Leaving out more gives you more confidence in your validation and our toy model will validate so long as you retain at least one data point for each of the groups to ‘anchor’ the model (Figure 1).&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_9pt-rDMtsM4/S8xAIewqEMI/AAAAAAAAAB8/m_R4fWspnH8/s1600/gmc6a.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://2.bp.blogspot.com/_9pt-rDMtsM4/S8xAIewqEMI/AAAAAAAAAB8/m_R4fWspnH8/s400/gmc6a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5461810962533847234" /&gt;&lt;/a&gt;&lt;br /&gt;Now let’s see what happens when we try to do the prediction shown in Figure 2 (see purple line).  This should be a safe prediction because the model has passed the LOO test and we’re predicting from smack the middle of the training set where we would normally have the most confidence in the model.  However, there is a slight problem.  The linear combination of descriptors on the horizontal axis is a latent inhibitor variable (LIV) which for many QSAR models is Nemesis, although creators of these models are seldom aware of this.    If you have two groups of structurally related compounds for which the average activities differ and enough descriptors, you’ve got a good chance of finding a LIV that gives you some separation of the two structural groups.  If you can do this you’ll have generated a model that will cross-validate successfully.&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_9pt-rDMtsM4/S8w_8Zm6f-I/AAAAAAAAAB0/Sj1REpqtffs/s1600/gmc06b.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://2.bp.blogspot.com/_9pt-rDMtsM4/S8w_8Zm6f-I/AAAAAAAAAB0/Sj1REpqtffs/s400/gmc06b.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5461810754992373730" /&gt;&lt;/a&gt;&lt;br /&gt;The trouble comes when you try to do a prediction like the one in Figure 2.  Because we’re dealing with a LIV, this prediction actually represents an extrapolation even though the model might ‘think’ that the prediction is an interpolation.  The model may use a large number of descriptors but so long as it keeps cross-validating we keep adding more and more, per absurdum ad nauseum.  If we could only recognise the structural groups in the data we could distinguish them with proper indicator variables but instead we use LIVs which don’t tell you where the gaps are unless you look very carefully.  But enough of our views, let’s see what &lt;a href="http://dx.doi.org/10.1021/ci0342472" target="_window"&gt;PoO&lt;/a&gt; has to say, pausing only to give thanks that it’s 'overfitting' and not 'over-fitting':&lt;br /&gt;&lt;br /&gt;&lt;em&gt;“LOO does however have two blind spots.  If the compound collection is made up of a few core chemical compositions, each of which is represented by several compounds of nearly identical composition x, then the operation of removing any single compounds will not be sufficient to get its influence out of the data set, because of the fraternal twin(s) still in the calibration.  Under these circumstances, LOO will over-state the quality of the fit."&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;We’ll leave LOO’s other blind spot for another day because we’d like conclude by sharing some thoughts on what we think QSAR modellers should be doing if they want to claim that their models are truly global.   First selection of training sets should aim for a maximally even coverage of space defined by the descriptors even if that means discarding data.  Secondly molecular similarity measures should be used to ensure that no two molecular structures in the training set are too similar even if this means discarding data. &lt;br /&gt; &lt;br /&gt;We think this is quite a good place to leave things for now and hope that you’re all now on intimate terms with LIV.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2010/06/easing-off-on-categorical-sin.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-4488981547735926396?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/4488981547735926396/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=4488981547735926396' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4488981547735926396'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4488981547735926396'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2010/04/in-loo-of-crapshoot.html' title='In LOO of a Crapshoot'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_9pt-rDMtsM4/S8xAIewqEMI/AAAAAAAAAB8/m_R4fWspnH8/s72-c/gmc6a.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5728513273391273814</id><published>2010-04-11T02:16:00.000-07:00</published><updated>2010-09-26T15:06:27.309-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='categorical sin'/><category scheme='http://www.blogger.com/atom/ns#' term='latent indicator variable'/><category scheme='http://www.blogger.com/atom/ns#' term='qsar'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><category scheme='http://www.blogger.com/atom/ns#' term='validation'/><title type='text'>Who guards the guardians?</title><content type='html'>A bit over a week ago, we greatly enjoyed the &lt;a href="http://gmc2007.blogspot.com/2010/04/latent-indicator-variable-revisited.html" target="_window"&gt;reunion&lt;/a&gt; with our old friend the &lt;a href="http://gmc2007.blogspot.com/2009/02/latent-indicator-variable.html" target="_window"&gt;Latent Indicator Variable &lt;/a&gt;(LIV) and were greatly tempted to indulge in some &lt;a href="http://gmc2007.blogspot.com/2007/10/rotatable-bonds-2-sacred-cow-culled.html" target="_window"&gt;Categorical Sin &lt;/a&gt;(CS) to celebrate the occasion. LIV and CS both identify aberrant behaviour and these aberrations can be seen as mirror images of each other in as much as it is possible to get Sin to look at itself in the mirror.  Perhaps somebody much smarter and more visionary than us will demonstrate that LIV and CS are actually Fourier Transforms of each other and we will realise that Sin is simply a manifestation of the Wave Particle Duality like everything else.&lt;br /&gt;&lt;br /&gt;However, our intention is not to move from one sinful domain to another.  We have been bashing QSAR and its cousin QSPR for longer than we care to remember and are beginning to tire of this sport.   In order to maintain our wakefulness and sanity at the turkey shoot, we thought that it might a good idea to take a closer look at Validation.  Who is Validation, we hear you cry, and as Huxley may have put it, is she ‘pneumatic’?  Validation is both QSAR’s shield and QSAR’s Achilles Heel.  Duality indeed!&lt;br /&gt;&lt;br /&gt;As you’ll have read previously in &lt;a href="http://dx.doi.org/10.1007/s10822-007-9162-7" target="_window"&gt;DoA&lt;/a&gt;, predictive modelling in Drug Discovery typically involves lots of correlated descriptors so Overfitting is an ever-present present danger, especially when user-friendly (i.e. easy to generate some output) model building software is put in the hands of grinning halfwits who have only the most rudimentary understanding of the models that they are building.  Validating your model is one way that you can convince others (and yourself) that it has not been overfit.  Model validation typically involves only using some of the data to build a model and then using the model to predict the observations that you’ve left out.   We’ll start by taking a look at the Leave-One-Out (LOO) method for cross-validation and would like to state categorically that bringing PoO, LOO and LIV (Boers go to the livit'ry to krepp after a grit trik) together in a single Crapshoot should not be taken as a scatological comment on any specific predictive modelling methodology.&lt;br /&gt;&lt;br /&gt;LOO was described in &lt;a href="http://dx.doi.org/10.1021/ci0342472" target="_window"&gt;PoO&lt;/a&gt; and we’ll illustrate it using a couple of graphics, one of which we’ll simply recycle from the previous Crapshoot.  The LOO procedure involves discarding each data point in turn and re-fitting the model.    Let’s take a look at this for our enzyme inhibition model from the &lt;a href="http://gmc2007.blogspot.com/2010/04/latent-indicator-variable-revisited.html" target="_window"&gt;previous Crapshoot&lt;/a&gt; which is illustrated in Figure 1.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/_9pt-rDMtsM4/S8GUiG6d4YI/AAAAAAAAABs/yGV8zVWpa_I/s1600/gmc04a.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://1.bp.blogspot.com/_9pt-rDMtsM4/S8GUiG6d4YI/AAAAAAAAABs/yGV8zVWpa_I/s400/gmc04a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5458807537042055554" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Now let’s see what happens when we leave out one of the data points, an operation that we’ll show by coloring the discarded point as an unfilled circle (Figure 2).  You’ll see that the line of new line of fit (dashed black line) has moved away from the point that was discarded since that point can no longer influence the fit.  You can calculate something called a q-squared (q**2) which is similar to the R-squared (R**2) that many of you have already encountered.  We’ll talk a bit more about these quantities in the next Crapshoot and we’ll also tell you bit more about why LOO might give you an optimistic view of model quality for a dataset like this.   Until then please try to keep yourselves busy, motivated and within regions of chemical space acceptable to &lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt;Senior Pharma Fellow&lt;/a&gt;.       &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/_9pt-rDMtsM4/S8GUTLa50YI/AAAAAAAAABk/w5G4hXnoT8M/s1600/gmc05a.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://1.bp.blogspot.com/_9pt-rDMtsM4/S8GUTLa50YI/AAAAAAAAABk/w5G4hXnoT8M/s400/gmc05a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5458807280553808258" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://dx.doi.org/10.1007/s10822-007-9162-7"&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2010/04/in-loo-of-crapshoot.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5728513273391273814?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5728513273391273814/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5728513273391273814' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5728513273391273814'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5728513273391273814'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2010/04/who-guards-guardians.html' title='Who guards the guardians?'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_9pt-rDMtsM4/S8GUiG6d4YI/AAAAAAAAABs/yGV8zVWpa_I/s72-c/gmc04a.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-651980463749676087</id><published>2010-04-01T21:26:00.000-07:00</published><updated>2010-09-26T15:06:27.311-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='latent indicator variable'/><category scheme='http://www.blogger.com/atom/ns#' term='qsar'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><title type='text'>The Latent Indicator Variable revisited</title><content type='html'>In the &lt;a href="http://gmc2007.blogspot.com/2010/03/shakespearean-qsar.html" target="_window"&gt;previous Crapshoot&lt;/a&gt; we introduced (and greatly enjoyed) ‘&lt;a href="http://dx.doi.org/10.1021/ci0342472" target="_window"&gt;QSAR: dead or alive?&lt;/a&gt;’ with its warning of “... vast number of studies with sufficiently poor predictive qualities to underscore a growing shadow of doubt on an ever-darkening correlative landscape” and are seriously considering the opportunity to invest in small German start up company who specialise in the manufacture of stork cages.  It would appear that we are “... entangled in a descriptor jungle, unsure of how many and what types to use” and we have been reminded of the tale of &lt;a href="http://en.wikipedia.org/wiki/Tar_baby" target="_window"&gt;The Tar Baby&lt;/a&gt; on more than one occasion when reading papers on QSAR modeling.  Except that there was never any briar patch.&lt;br /&gt;&lt;br /&gt;We thought that it would be a good time to return to the &lt;a href="http://gmc2007.blogspot.com/2009/02/latent-indicator-variable.html" target="_window"&gt;Latent Indicator Variable&lt;/a&gt; and have created a couple of graphics which we hope that you’ll find pretty.  We’ll discuss these in more detail in the following Crapshoot so we’ll just say a bit about what the graphics are supposed to represent and we’ll leave it to you to enjoy them.  The plots in the figures are idealised situations and we just ‘sketched’ the lines of fit so if you were to digitise the pictures you’d find that these are not least squares fits.&lt;br /&gt;&lt;br /&gt;Figure 1 shows potency (pIC50) for two series of compounds.  The green compounds are a series of structural analogs that are more potent against a related enzyme and were identified by testing compounds from the other project.   None of the green compounds showed any great potency against the enzyme of interest and in fact the activity against the related enzyme was a bit of a safety worry.  The project chemists are actually synthesising compounds similar to those reported by a competitor since they are more potent against the enzyme of interest while showing less activity against the related enzyme with the safety issue. One the horizontal axis is a variable which is a linear combination of descriptors which has been found to predictive of pIC50 and we write it in shorthand using the summation symbol.  You will encounter linear combinations of descriptors time and time again in QSAR studies and we’ll have something to say about them in a future Crapshoot&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/_9pt-rDMtsM4/S7Vylna16KI/AAAAAAAAABU/xAWnhSdJNXU/s1600/gmc04a.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://3.bp.blogspot.com/_9pt-rDMtsM4/S7Vylna16KI/AAAAAAAAABU/xAWnhSdJNXU/s400/gmc04a.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5455392514191517858" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Figure 2 shows a plot of plasma protein binding against logP for some carboxylic acids (green) and some neutral compounds (blue).  On the horizontal axis is logP which is the logarithm (to base 10) of the octanol-water partition coefficient.  This quantity should not be confused with logD even though it is sometimes equivalent to it.  Even the less observant amongst you will note that increasing logP tends to result in stronger binding to plasma protein and the more observant readers will see that the carboxylic acids bind more strongly than the neutral compounds.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_9pt-rDMtsM4/S7Vy1sdIy3I/AAAAAAAAABc/93zZVYS8wNg/s1600/lip.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://2.bp.blogspot.com/_9pt-rDMtsM4/S7Vy1sdIy3I/AAAAAAAAABc/93zZVYS8wNg/s400/lip.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5455392790421228402" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;So there you have it.  Two graphical illustrations of the Latent Indicator Variable.  We’ll continue on this theme in the next Crapshoot and we hope that those of you in countries where Easter is celebrated will enjoy the holiday.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2010/04/who-guards-guardians.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-651980463749676087?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/651980463749676087/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=651980463749676087' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/651980463749676087'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/651980463749676087'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2010/04/latent-indicator-variable-revisited.html' title='The Latent Indicator Variable revisited'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_9pt-rDMtsM4/S7Vylna16KI/AAAAAAAAABU/xAWnhSdJNXU/s72-c/gmc04a.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-7594408653774423097</id><published>2010-03-19T21:09:00.000-07:00</published><updated>2010-09-26T15:06:27.315-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='qsar'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>Shakespearean QSAR</title><content type='html'>We cannot put it off any longer.  We must now return to our &lt;a href="http://gmc2007.blogspot.com/2009/04/latent-indicator-variable-2.html" target="_window"&gt;series of posts&lt;/a&gt; on predictive modelling and QSAR, depressing though this is.  Although the weather back at home in Ontario has been delightful, we are in Peru, very close to the Brazilian frontier, and it has rained continuously for the last four days, reminding us exactly what it is usually like to read an article on the prediction of aqueous solubility or hERG blockage.  On those occasions we are usually prompted to summon an appropriately-trained individual, equipped with captive bolt pistol, to put the offending article out of its (and our) misery.  On a brighter note, we think it is time to share an atypically good article on QSAR and predictive modelling.  It is entitled, &lt;a href="http://dx.doi.org/10.1007/s10822-007-9162-7" target="_window"&gt;QSAR: dead or alive?&lt;/a&gt; (DoA) and is a lively and entertaining read.  The article and &lt;a href="http://dx.doi.org/10.1021/ci0342472" target="_window"&gt;The Problem of Overfitting&lt;/a&gt;, (PoO) which has already &lt;a href="http://gmc2007.blogspot.com/2009/01/perils-of-overfitting.html" target="_window"&gt;featured in this column&lt;/a&gt;, should both be read by anyone planning to build, use or be influenced by QSAR models.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://dx.doi.org/10.1007/s10822-007-9162-7" target="_window"&gt;DoA&lt;/a&gt; starts by taking a look at the difference between correlation and causation and you’ll remember us &lt;a href="http://gmc2007.blogspot.com/2007/10/rotatable-bonds-2-sacred-cow-culled.html" target="_window"&gt;questioning in an earlier Crapshoot&lt;/a&gt; whether it is really possible to say that poor oral bioavailability is any more a consequence of too many rotatable bonds than of too high a molecular weight and we can see &lt;a href="http://dx.doi.org/10.1021/jm901371v" target="_window"&gt;others heading for that particular tar pit&lt;/a&gt;.  Correlation does not mean causation and in fact, as we’ll demonstrate in a future Crapshoot, correlation may not even mean correlation although we don’t think that is a particularly helpful place to go right now.  We liked the examples presented in &lt;a href="http://dx.doi.org/10.1007/s10822-007-9162-7" target="_window"&gt;DoA&lt;/a&gt; such as the eminently sensible proposal to increase the birth rate in Germany by inducing more storks to nest there.  We propose using the incentive of placing (concentrating?) them in cages.   However, what we really want you to take a look at is Figure 3.&lt;br /&gt;&lt;br /&gt;Figure 3 in &lt;a href="http://dx.doi.org/10.1007/s10822-007-9162-7" target="_window"&gt;DoA&lt;/a&gt; shows an excellent correlation between length and width for a large collection of skulls gathered from the Paris Catacombs which seemed a lot more dead than alive even when allowing for a Schroedingerian ambiguity on the latter point.  Actually there are two groups of skulls:  male and female.  Most of the strength in this correlation comes from the absolute differences in skull sizes between men and women and if you’re wondering where you’ve seen this before it’s just our old friend the &lt;a href="http://gmc2007.blogspot.com/2009/02/latent-indicator-variable.html" target="_window"&gt;Latent Indicator Variable&lt;/a&gt; that was introduced in an excruciating sequence of posts last year.  But let’s just move on because we’ve flogged that very dead horse to a greater extent than is usually considered tasteful.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://dx.doi.org/10.1007/s10822-007-9162-7" target="_window"&gt;DoA&lt;/a&gt; also highlights the &lt;a href="http://dx.doi.org/10.1021/jm00196a017" target="_window"&gt;problem of chance correlation&lt;/a&gt;.  These days you have lots of descriptors with which to craft your predictive model.  Literally buckets of them!  Everything from the kurtosis of  quadrupole-scaled atom charges to an entire family of spherical harmonics derived from the trace of the hyperpolarizability tensor.  However, the QSAR modeller’s wet dream is a &lt;a href="http://en.wikipedia.org/wiki/Hieronymus_Bosch" target="_window"&gt;Bosch&lt;/a&gt;-sculpted nightmare for anybody trying to use the models to gain insight or even that slight edge over the opposition.   If you’ve lots of descriptors with which to play, you’re more likely to find a significant correlation that:&lt;br /&gt;&lt;br /&gt; “...is a tale&lt;br /&gt;Told by an idiot, full of sound and fury,&lt;br /&gt;Signifying nothing”.&lt;br /&gt;&lt;br /&gt;However, the bad news doesn’t end there because chemical space is not uniformly occupied.  We’ve already discussed some of the consequences of this in connection with the Latent Indicator Variable.  We believe that an uneven distribution of molecules in chemical space further increases the likelihood of finding a chance correlation.  We’ll talk a bit more about that in the next Crapshoot and will leave you with this most sensible of suggestions from &lt;a href="http://dx.doi.org/10.1021/ci0342472" target="_window"&gt;PoO&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;“If the collection of compounds consists of, or includes, families of close analogues of some smaller number of ‘lead' compounds, then a sample reuse cross-validation will need to omit families and not individual compounds.”&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Now why doesn’t everyone do that?&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2010/04/latent-indicator-variable-revisited.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-7594408653774423097?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/7594408653774423097/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=7594408653774423097' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7594408653774423097'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7594408653774423097'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2010/03/shakespearean-qsar.html' title='Shakespearean QSAR'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-4722273850695320082</id><published>2010-01-20T03:23:00.000-08:00</published><updated>2010-09-26T15:06:27.436-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='house keeping'/><title type='text'>Commenting on The Crapshoot</title><content type='html'>We have been forced to tighten up on comments made on The Crapshoot.  The comments will continue to be moderated but now there will be a word check and you'll need either a google account or OpenID.  We apologise for the extra restrictions but there a couple of thick-skinned half-wits who just don't seem to be able to take a hint.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-4722273850695320082?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/4722273850695320082/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=4722273850695320082' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4722273850695320082'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4722273850695320082'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2010/01/commenting-on-crapshoot.html' title='Commenting on The Crapshoot'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-2436684969228393276</id><published>2009-11-29T23:42:00.000-08:00</published><updated>2010-09-26T15:06:27.440-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='organisational'/><category scheme='http://www.blogger.com/atom/ns#' term='pharma life'/><category scheme='http://www.blogger.com/atom/ns#' term='gossip'/><title type='text'>Group Manager to Pharma Fellow in one easy lesson</title><content type='html'>We promised to update you on some gossip from work and our loyal readers (all three of them) know that we do like to keep our promises.   You’ll remember &lt;a href="http://gmc2007.blogspot.com/2008/11/scaling-scientific-eigernordwand-part-2.html" target="_window"&gt;Group Manager &lt;/a&gt;who couldn’t understand why he shouldn’t be listed as an author on everything that any member of the group wanted to publish.  Things had come to a head when Top Gun, having done three years as a post-doc in the lab of a Key Opinion Leader on the East Coast, was a bit disgruntled when Group Manager wanted to treat her like a graduate student. The trouble was that Group Manager was actually not a very good manager and his influencing strategy did not extend beyond setting the Caps Lock and letting rip.  Relations deteriorated further and Top Gun took a secondment to the Emerging Antiviral Therapies Team and made it quite clear that she wasn’t coming back until They did something about Group Manager’s Stalinist micromanagement.  Group Manager’s VP was running out of ideas and in desperation summoned GMC to his office (yes, that desperate!) to see what our slightly unorthodox approach to organisational re-alignment might offer.   Here is a transcript of the discussion:&lt;br /&gt;&lt;br /&gt;GMC:  Well, there is the obvious solution, as they say, ‘pour encourager les autres'.&lt;br /&gt;VP:  I’m afraid we can’t do that; he is a manager after all.  If we start culling them at that level it’ll only be a matter of time before it gets to the VPs.&lt;br /&gt;&lt;br /&gt;GMC: OK, what about the ‘Two Paths, One Mission’ initiative?  You can get one of those half-wits in Human Resources to move him from Management to Science with a handful of keystrokes.  Make him a Junior Pharma Fellow (JPF) and your problem is solved.&lt;br /&gt;VP: I’m afraid that it’s not that simple.&lt;br /&gt;&lt;br /&gt;GMC:  How so?  Aren’t Group Manager and Junior Pharma Fellow (JPF) equivalent roles in the ‘Two Paths, One Mission’ initiative?&lt;br /&gt;VP:  Well the roles are equivalent but the salary scales are different.  If you look at salary, Group Manager is actually equivalent to Pharma Fellow (PF).&lt;br /&gt;&lt;br /&gt;GMC:  So why do you say that Group Manager is an equivalent role to JPF when Group Manager is actually an equivalent role to PF?&lt;br /&gt;VP:  Well the management puts an equal value on the Group Manager and JPF roles but market data mean that we have to pay Group Manager more.&lt;br /&gt;&lt;br /&gt;GMC:   So what exactly is this market data?  Who analyses it?&lt;br /&gt;VP:  The Human Resources people keep the data and it is highly confidential.  Even I don’t get to look at it.&lt;br /&gt;&lt;br /&gt;GMC:  Well it looks like you’re going to have to convert him to PF.  With the market data secure in the HR information black hole, you should be able to do whatever you like and cite data that nobody will ever see in support of your decision.&lt;br /&gt;VP:  We probably could but the problem is that going from Group Manager to PF is technically a promotion even if it doesn’t involve a salary increase.&lt;br /&gt;&lt;br /&gt;GMC:  I see the problem.  You need to justify the promotion and you’ll have to tell everyone what a great scientist he is when he hasn’t been corresponding author on a journal publication since 2001.&lt;br /&gt;VP: Precisely!&lt;br /&gt;&lt;br /&gt;GMC:  Well let’s see what we might use.  Didn’t he help organise some conference and didn’t some university give him an Honorary Professorship?&lt;br /&gt;VP: I’m not sure about the Honorary Professorship.  He actually asked one of his friends there if they could sort it and things like this are really pretty worthless these days.  You can buy them by giving somebody a juicy slot at a conference that you’re organising.  It really is that simple and cheap!&lt;br /&gt;&lt;br /&gt;GMC:  OK why don’t we say that he’s a Key Opinion Leader (KOL)?&lt;br /&gt;VP:  Nice idea but there would be problems because we're already calling &lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt;Senior Pharma Fellow&lt;/a&gt; (SPF) a KOL.  I mean they can’t both be KOLs because SPF would throw a hissy fit.  You know what his ego is like.&lt;br /&gt;&lt;br /&gt;GMC:  OK let’s call SPF ‘Thought Leader’.  Wouldn’t that be tidy?  Then you can call PF a KOL without offending SPF.&lt;br /&gt;VP:  What an excellent idea!  I must have thought of it myself.  But we still need to create a role for PF.&lt;br /&gt;&lt;br /&gt;GMC:  That shouldn’t be a problem.  You can say he’s providing leadership for the JPFs. &lt;br /&gt;VP:  Are you sure?  The JPFs are a particularly tiresome group and they’re unlikely to fall for the KOL farce.  They’re also a lot stronger scientifically than PF so there could be real problems.&lt;br /&gt;&lt;br /&gt;GMC:  Well you didn’t like our first suggestion so I think this is all you can do.   He doesn’t need to actually provide leadership for the JPFs.  You just need to say that he’s providing leadership and the organisational inertia will do the rest.  How about suggesting that he get them to write up some research proposals.  That’ll create an illusion of leadership.&lt;br /&gt;VP: Not so sure about the research proposal idea.  I mean there’s no resource for that sort of thing.&lt;br /&gt;&lt;br /&gt;GMC:  The lack of resource is exactly why it is a good idea.  If those meddlesome JPFs are continually writing proposals for projects that will never be resourced, they won’t have the time to create trouble.&lt;br /&gt;VP:  What a masterstroke!  I almost hadn’t realised that I'd thought of it before. But I have one last question.  I’m concerned that appointing Group Manager as PF and calling him a KOL will lose me respect among the scientific community.&lt;br /&gt;&lt;br /&gt;GMC:  That’s one thing that you don’t need to worry about.&lt;br /&gt;VP:  How can you be sure?&lt;br /&gt;&lt;br /&gt;GMC: The scientific community in this company stopped taking you seriously years ago.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Any similarity between the characters in this Crapshoot and persons alive or dead is entirely coincidental.  No children, animals, VPs, Group Managers, Pharma Fellows or Senior Pharma Fellows were harmed in the preparation of this Crapshoot.&lt;/em&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-2436684969228393276?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/2436684969228393276/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=2436684969228393276' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2436684969228393276'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2436684969228393276'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2009/11/group-manager-to-pharma-fellow-in-one.html' title='Group Manager to Pharma Fellow in one easy lesson'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-8214361357114502264</id><published>2009-10-27T16:47:00.000-07:00</published><updated>2010-09-26T15:06:27.444-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='categorical sin'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><title type='text'>Is promiscuity categorically sinful?</title><content type='html'>We thought that a categorical sin would represent an excellent way to return to shooting the crap as we like to say.  This particular categorical sin is connected with promiscuity although it is not clear whether the latter should be regarded as vice or virtue.  We will return to that question at a later date but for the present we simply invite you fasten your seatbelt, sit back and wallow in the sheer, undiluted sinfulness of it all.&lt;br /&gt;&lt;br /&gt;Take a look at today’s &lt;a href="http://dx.doi.org/10.1016/j.sbi.2006.01.013" target="_window"&gt;featured article&lt;/a&gt; but don’t bother to read it if you’re in a hurry.  Just go to Figure 2 because that’s where all the action happens.  This figure claims to illustrate the relationship between promiscuity and molecular weight .  Promiscuity is defined by the number of targets that the compound inhibits with an IC50 of less than 10 micromolar.  As an aside it should be mentioned that  10 micromolar inhibition in an in vitro assay does not necessarily translate into in vivo inhibition.  You need to know blood levels to answer that question.  More precisely free blood levels but we’re not going to there today because it’s an even worse place than the Laotian monastery!&lt;br /&gt;&lt;br /&gt;Anyway back to Figure 2.  There are a number of similarities between this on and the one illustrating the relationship between promiscuity and lipophilicity that starred in an &lt;a href="http://gmc2007.blogspot.com/2008/05/breaking-stone-in-changi.html" target="_window"&gt;earlier Crapshoot&lt;/a&gt;.   Promiscuity is an integer and so all you need to do is calculate the average molecular weight for each value of promiscuity and you’re ready to plot.  Isn’t Key Opinion Leadership easy!  Well these guys did the plot and got an R-square of 0.93.  Does this mean that you’d get an R-square of 0.93 for the raw data?  We suspect not.&lt;br /&gt;&lt;br /&gt;Our Loyal Readers know only too well by now what makes us a little queasy about plots like this.   Very simply, variation is hidden and for those of you who’ve joined us we’ll try to explain. Take a look at the point that has been plotted for compounds hitting just one target.  The mean molecular weight for these compounds is about 430 Da and for the sake of the discussion let’s just say that the mean is exactly 430 Da.  You could get this value if all these compounds have a molecular weight (MW) of 430 Da but you’d get the same value if half the compounds had MW of 230 Da and if the other half had MW of 630 Da.   &lt;br /&gt;&lt;br /&gt;Figure 2 is a plot of the trend in the data and not the data itself and we’re not sure what the R-square for the trend in the data really means.  We have already noted that the R-square that you get from treating the data in this manner depends on the number of levels of promiscuity.  How can you make this statement, M. le Crapshoot, without even looking at the data, we hear you cry.   Patience, Esteemed Readers, you really should have read your back copies of The Crapshoot more carefully.   You’ll see that the largest number of assays in which compounds are active is 18.  Let’s call compounds that hit 1-9 assays less promiscuous and assign them an integer of 1.  The other compounds we’ll call really promiscuous (we’ll also call the vice squad) and assign them an integer of 2.  Now plot the average value of any parameter or property that you like for each group of compound s  against the integer that you’ve assigned the compounds to and we suspect that you’ll get an R-square of 1.  If not it’s going to be 0.  This is but one of the manifestations of categorical sin.&lt;br /&gt;&lt;br /&gt;Now you may think we’re being a bit harsh on the nice folk named in the footnote to Figure 2.  After all they are honest enough to state that the standard deviation for each promiscuity value is high and they also show that the highly promiscuous compounds are much less numerous than the less promiscuous compounds.  Figure 2 would have been greatly improved by displaying these standard deviations and Our Loyal Readers know only too well that it is the standard deviations which must be shown and not the standard errors.  That would be &lt;a href="http://gmc2007.blogspot.com/2008/07/desperately-seeking-signifcance.html" target="_window"&gt;truly sinful&lt;/a&gt; and we hope that Sensitive Readers will not have been offended by all this talk of promiscuity and standard deviants.&lt;br /&gt;&lt;br /&gt;So what’s to be done about Figure 2?  Firstly we should point out that one justification for plotting the data in this manner is that the creators of Figure 2 appear to be trying to explore the response of promiscuity to molecular weight.  Most of the compounds in the data set are relatively non-promiscuous and would dominate if all data points were used.  There are a couple of options for dealing with this problem.  Firstly you could simply show the standard deviation for each promiscuity level.  A second option would be to create a new data set by randomly selecting a fixed number of compounds for each promiscuity level.  This new data set would be a lot more suitable for regression analysis and you could also set molecular weight to be the independent variable which is more appropriate if you’re thinking of promiscuity as a response to molecular weight.   If we were going down this route we would also include compounds that don’t show activity in any assays.&lt;br /&gt; &lt;br /&gt;So there you have it.  As categorical sins go, this one is actually not too sinful and shouldn’t result in anything more than a transient stay in data analytic purgatory for hiding variation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-8214361357114502264?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/8214361357114502264/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=8214361357114502264' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8214361357114502264'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8214361357114502264'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2009/10/is-promiscuity-categorically-sinful.html' title='Is promiscuity categorically sinful?'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-7617750033294648195</id><published>2009-10-06T17:05:00.000-07:00</published><updated>2010-09-26T15:06:27.481-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='travel'/><title type='text'>Summer is over and The Crapshoot is back!</title><content type='html'>Summer recess is over and having been goaded into action by one of our readers (whom we believe to number about half a dozen), it is time to re-start shooting the crap as we like to say.  We have been spending the last few months in a Laotian monastery where you will not find internet, telephones or (most importantly) white-coated orderlies.  Needless to say, this environment is not conducive to blogging or in fact compliance with our medication.&lt;br /&gt;&lt;br /&gt;We will return to the gratuitous bashing of predictive modelling in due course but will first share (in the next post) a most naughty categorical sin.   This has inspired us to create a ‘categorical sin’ label so that Loyal and Discerning Readers of The Crapshoot can indulge in sins of this nature more easily.  There is also plenty of juicy gossip to catch up on because Group Manager has been ‘promoted’ to Pharma Fellow and his manager is desperately trying to re-package him as a Key Opinion Leader to the great amusement of those whose opinions he is supposed to be leading.    Even, or should we say especially, the janitorial staff are enjoying the joke.  Stay tuned!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-7617750033294648195?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/7617750033294648195/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=7617750033294648195' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7617750033294648195'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7617750033294648195'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2009/10/summer-is-over-and-crapshoot-is-back.html' title='Summer is over and The Crapshoot is back!'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5877285000917278512</id><published>2009-04-20T13:48:00.000-07:00</published><updated>2010-09-26T15:06:27.486-07:00</updated><title type='text'>Another year, another Senior Pharma Fellow</title><content type='html'>It is now 2 years since we started The Crapshoot and 58 posts and 20k pageloads later it has not been put out of your misery.  The second year has been less eventful than the first in that we received no death threats, something that greatly disappoints us.  This year marked the debut of one of our favorite characters who we have named&lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt; Senior Pharma Fellow &lt;/a&gt;although you will know him by any of a number of names that tact prevents us from mentioning.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5877285000917278512?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5877285000917278512/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5877285000917278512' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5877285000917278512'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5877285000917278512'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2009/04/another-year-another-senior-pharma.html' title='Another year, another Senior Pharma Fellow'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-3952128900668955234</id><published>2009-04-01T14:18:00.000-07:00</published><updated>2010-09-26T15:06:27.487-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='latent indicator variable'/><category scheme='http://www.blogger.com/atom/ns#' term='qsar'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><title type='text'>The latent indicator variable 2</title><content type='html'>The toy example in the &lt;a href="http://gmc2007.blogspot.com/2009/02/latent-indicator-variable.html" target="_window"&gt;previous post&lt;/a&gt; is clearly a bit of an over-simplification although it is useful for illustration of some ideas. With only two substituents, it should be pretty obvious to all but the most witless when compounds with one substituent are more active than the corresponding compounds with the other substituent.&lt;br /&gt;&lt;br /&gt;Things get a bit more complicated when you have a number of substituents.  Time for another of The Crapshoot’s annoying toy examples, for which we make no apology.  If you find reading this garbage to be a painful experience then please spare a thought for those of us who have to write it.  &lt;br /&gt;&lt;br /&gt; Suppose you can now have one of 5 substituents at a particular position instead of just chlorine and the ‘un-substituent’ hydrogen.  Let’s also assume classic Free-Wilson linearity-additivity in the SAR such that each substituent makes a constant (and different) contribution to activity.   Although this is a rather contrived system it is not too different from the situation that exists in MedChem projects where a well-defined ranking of substituents is observed that is independent of what may be present at other positions of diversity in the molecule.   If we’ve got 5 compounds each with a different one of these 5 subsitituents you should be able to fit whatever biological activity you observe using 5 different substituent parameters, provided that each has different values for each substituent.  For example you might use sigma meta, sigma para, sigma resonance, sigma inductive, volume, cube root of the trace of the substituent polarizability tensor, &lt;em&gt;ad nauseum&lt;/em&gt;.  The key point is that it just doesn’t matter as long as that each parameter has different values for each subsituent.  This is the curse of the Latent Indicator Variable.&lt;br /&gt;&lt;br /&gt;Now 5 adjustable parameters and 5 compounds would really look rather like over-fitting.  But suppose we’ve done this combinatorially and have another position (let’s call it B) of diversity at which we can have one of 10 substituents.  Now there are 10 compounds with each one of the 5 original substituents (let’s call these the position A substituents).   Now here’s the fun bit and don’t worry because we’ll hold your hand so we can do it together.  We’re going to take the average pIC50 for compounds with each of the 5 position A subsituents.  Provided that these averages are all sufficiently different, you’ll get some sort of model when you use all the data points.   And when you use all 50 data points, using 5 adjustable parameters doesn’t look quite so naughty.&lt;br /&gt;&lt;br /&gt;The problem is that we’ve used Latent Indicator Variables and, even with 50 data points, this model only works if a compound contains one of the 5 position A substituents that we’ve used to train the model.  Unfortunately the situation is a less easy to spot than when we’ve only got two substituents to worry about.   A compound might sit right at the centroid of the model space and the unwary would say this was interpolation.  Yes, if you’re using one of the 5 position A substituents used to train this model but otherwise No.&lt;br /&gt;&lt;br /&gt;This is probably a good point at which to sign off.  There were so many things we wanted to talk about like correlations between descriptors, why it doesn’t really make sense to use Hammett constants to model biomolecular recognition and the dangers to Civilisation poised by structural clusters in training sets.  However, enough is enough and we’ll leave you with a problem that anyone who has done some ten pin bowling will be familiar with.  Your first ball has knocked down all the pins except two. Anyone care to guess which two?  In case, you’ve not figured it out, the two balls are numbers 7 and 10.  That’s why they call it a 7-10 split!  They sit at opposite ends of the back row and the centroid of the model space is not going to be a whole lot of help now.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2010/03/shakespearean-qsar.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-3952128900668955234?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/3952128900668955234/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=3952128900668955234' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3952128900668955234'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3952128900668955234'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2009/04/latent-indicator-variable-2.html' title='The latent indicator variable 2'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-7656259543563058369</id><published>2009-02-28T07:34:00.000-08:00</published><updated>2010-09-26T15:06:27.490-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='latent indicator variable'/><category scheme='http://www.blogger.com/atom/ns#' term='qsar'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><title type='text'>The latent indicator variable 1</title><content type='html'>Well it does seem a while since we last posted and there is still much work to do as we continue from the &lt;a href="http://gmc2007.blogspot.com/2009/01/islands-in-chemical-ocean.html" target="_window"&gt;previous post&lt;/a&gt;.  The situation in which you either have chlorine or hydrogen at C4 of the phenyl should be easy to spot using any of a number of substituent parameters and comparing average pIC50 values for the two groups of compounds will give you a good idea of whether or not substitution with chloro is good for activity.   If substitution with chloro at C4 leads to a consistent increase in potency, you’ll get model that is both predictive and that can be validated.  So exactly what is your point, we hear you cry.&lt;br /&gt;&lt;br /&gt;OK let’s be a bit more specific.  We’ll use the Wikipedia as our source of &lt;a href="http://en.wikipedia.org/wiki/Hammett_equation" target="_window"&gt;Hammett sigma constants&lt;/a&gt;.   The Hammett sigma constant for meta-chloro is +0.37 and (by definition) that for hydrogen is zero.  If chloro substitution leads to a significant increase in potency you should get a reasonable model by fitting pIC50 to sigma.  It will satisfy validation criteria and &lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt;Senior Pharma Fellow&lt;/a&gt; (SPF) will be able to rattle off an impressive array of quality control metrics in his next presentation.  Aren’t we clever!  Surely it’s time to use the model to do some predicting.&lt;br /&gt;&lt;br /&gt;Our chemists want to know what happens if we introduce methoxy or fluoro at C4.  Actually they don’t like &lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt;Senior Pharma Fellow&lt;/a&gt; (SPF) any more than we do but there is a directive from the Project Management Politburo that these models are to be used even if they are not believed.   Furthermore you need to run the model so that you can tick the relevant boxes on the Authorisation For Synthesis form that the tiresome Black-Belted Half-Wits have set up for the gathering of Base-line Productivity Indicators.   At least we know that we won’t be extrapolating because the Hammett sigma values for meta-methoxy and meta-fluoro are +0.11 and +0.34 respectively so both lie within the space spanned by the training set.  We’d predict that replacing chloro at C4 with fluoro would to lead to a small drop in potency because the relevant Hammett sigma values are so similar.  We’d be particularly confident in our predictions for the methoxy-substituted analogs because this represents interpolation to a greater extent than if we were doing predictions for the compounds with which the model was built.&lt;br /&gt;&lt;br /&gt;Now for the sake of argument, let’s suppose we’d decided to use the Hammett constants for these substituents at the para position.  The value for chlorine is now +0.23 and that for hydrogen is still zero (by definition) as before so the quality of the model.    However fluoro (sigma-para = +0.06) looks much more like hydrogen than chloro while methoxy (sigma-para = -0.27) now lies well outside the space spanned by the training set.  Needless to say this is a very different picture to what we saw using sigma-meta values.&lt;br /&gt;&lt;br /&gt;What does this all mean?  This is obviously a toy example that we’ve created to illustrate a point.  However it is clear that if we’re building models using pIC50s for compounds that are either unsubstituted or have chloro at C4 then sigma-para will work just as well as sigma-meta.  The sigma values function as indicator variables and any parameter which has different values for chloro and hydrogen substituents will do the job just as well.  The problem is that for these models having anything other than hydrogen or chloro at C4 represents an extrapolation while the continuous nature of sigma constants suggests that we might be interpolating.  Real models are typically a lot more complex than this toy example and it is often not clear when linear combinations of continuous variables are actually functioning as indicator variables.  We’ll pick up in the next post since it is getting late and there is cider to be drunk. It should be fun and hopefully we will not encounter a latent indicator variable (LIV).&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2009/04/latent-indicator-variable-2.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-7656259543563058369?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/7656259543563058369/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=7656259543563058369' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7656259543563058369'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7656259543563058369'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2009/02/latent-indicator-variable.html' title='The latent indicator variable 1'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-569795109704101055</id><published>2009-01-25T14:58:00.000-08:00</published><updated>2010-09-26T15:06:27.492-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>Islands in the chemical ocean</title><content type='html'>We left you rather abruptly in the &lt;a href="http://gmc2007.blogspot.com/2009/01/perils-of-overfitting.html" target="_window"&gt;previous post&lt;/a&gt;, having been stung by your suggestion that we might be uncouth. However, we have decided to forgive you and continue with our tale.&lt;br /&gt;&lt;br /&gt;We'll start with a scenario with which many of our loyal and patient readers will be familiar. You're optimising a series and have found that adding a chloro substituent at C4 of one of the phenyl rings increases the pIC50 (-log IC50 in concentration units of mol/litre) by a unit regardless of what substituents are present at C3 and C5. Those of you who've worked in drug discovery will have seen this sort of thing. Everybody in the project knows that the 4-chloro substituent is good for potency and if it goes the potency has to be clawed back from somewhere else. Just like tax.&lt;br /&gt;&lt;br /&gt;This sort of thinking is the basis of &lt;a href="http://dx.doi.org/10.1021/jm00334a001" target="_window"&gt;Free-Wilson analysis&lt;/a&gt;. The C4 chlorine and the hydrogen of the unsubstituted C4 can each be thought of as contributing to potency. The contribution of the chlorine is a log unit greater than that of hydrogen. So you've recognised this pattern in your project data but this isn't good enough. What do you mean, "not good enough". You have quite some nerve, M. le Crapshoot. Nothing to do with us. The Chemistry Discipline Review Committee have decided that they'd really prefer that you did this sort of thing with some equations rather than this uncultured chemical structure stuff. Also &lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt;Senior Pharma Fellow&lt;/a&gt; (SPF) needs some equations for the presentation slides that his secretary is preparing for him. Can't you just generate some predictive models instead of being so difficult.&lt;br /&gt;&lt;br /&gt;Well you didn't handle that very well, did you? Anyway stop complaining because you've got work to do. You do some modelling and you find out the Hammett sigmas (both meta and para) for the C4 substituent are both useful predictors of pIC50 as are the substituent hydrophobicity parameter and the molar mass of the substituent. Then you make a startling discovery.&lt;br /&gt;&lt;br /&gt;The molecules with which you're building the models either have chlorine at C4 or are unsubstituted at this position.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2009/02/latent-indicator-variable.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-569795109704101055?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/569795109704101055/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=569795109704101055' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/569795109704101055'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/569795109704101055'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2009/01/islands-in-chemical-ocean.html' title='Islands in the chemical ocean'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-6625684465481508220</id><published>2009-01-02T05:20:00.000-08:00</published><updated>2010-09-26T15:06:27.499-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>The perils of overfitting</title><content type='html'>We do seem to have let things slip over the holiday period but rest assured, Esteemed Readers, that we have not abandoned you or, for that matter, William of Ockham who is familiarising himself with the health and safety implications of bringing razors to work.  Following on from the &lt;a href="http://gmc2007.blogspot.com/2008/12/where-are-models.html" target="_window"&gt;last post&lt;/a&gt;, we take a look at &lt;a href="http://dx.doi.org/10.1021/ci0342472"&gt;article on overfitting &lt;/a&gt;.  Unlike much of the literature we review in this column, we rather like this article and think it's a real shame that many of the folk who publish predictive models are palpably unaware of its existence. &lt;br /&gt;&lt;br /&gt;What is overfitting, one might ask?  The author notes that, "Occam's razor, or the principle of parsimony, calls for using models and procedures that contain all that is necessary for the modeling but nothing more".  How aptly put!  Blofeld and Random Forest had better mind their step. You might ask what all this means and we will respond that the predictive models that we're talking about all use adjustable parameters to fit the data.  To some extent, the more parameters that you use, the better the fit that you'll achieve.  One definition of overfitting is using a model that has too many parameters.  Or at least more than you needed.&lt;br /&gt;&lt;br /&gt;By now you'll have figured out why we are keen to see the actual models rather than just reading about their wonderful r-squares, q-squares and root mean squares.  When the actual model is presented you can see exactly how many parameters it uses. "But isn't this just the number of descriptors?", you might ask.  Basically yes, if it's a linear regression model and you don't count the intercept as a parameter.  Once you enter the non-linear world of neural nets this is not the case any more.  We don't believe that any journal that wishes to be considered respectable should be publishing new predictive models unless these models are are fully specified in in the article.&lt;br /&gt;&lt;br /&gt;So we hope we've now got your attention.  We're predicting solubility and have two models at our disposal, both of which have satisfied validation criteria.  "What are these validation criteria?", we hear you cry.  Fair point!  The models satisfactorially predicted the solublities of compounds that were not used to train the model.  We'll discuss validation in a future post because to do so here would get us bogged down in the data-analytic equivalent of Passchendaele.  Anyway back to the models.  We'll use root mean square error (RMSE) as our measure of model quality and there are problems with this. However, it's another point that we're trying to address and RMSE will work well for that.  One model predicts log(S/M) with RMSE = 0.3 and uses 100 parameters and the other uses 3 parameters and predicts log(S/M) with RMSE of 0.4.  The observant amongst you will have noticed that we're actually using the logarithm of the molar solublity rather than solubility itself and there are really good reasons for doing this which we'll not go into right now.  Anyway with which model are you going to use to make your predictions?&lt;br /&gt;&lt;br /&gt;Being regular readers of The Crapshoot has of course made you cynical.  You've seen some of the underhand tricks that folk can use to persuade you that the trends that they have uncovered are stronger than they actually are (see examples &lt;a href="http://gmc2007.blogspot.com/2008/05/breaking-stone-in-changi.html" target="_window"&gt;1&lt;/a&gt; and &lt;a href="http://gmc2007.blogspot.com/2008/05/breaking-stone-in-changi.html" target="_window"&gt;2&lt;/a&gt; to get an idea of what we mean).  The models have both been validated but you can't see the details so you're quite right to be suspicious.  You're also thinking that the model with RMSE of 0.4 using 3 parameters is less likely to be overfit than the model with RMSE of 0.3 with 100 parameters.  Also you might expect the first model to work better for compounds that are not chemically similar to the compounds used to train it.  However, in the predictive modelling world validation is assumed to be valid &lt;br /&gt;and we can only ask who guards the guardians.  Once models have been validated, the numbers of descriptors used become irrelevant.&lt;br /&gt;&lt;br /&gt;Let's go back to the &lt;a href="http://dx.doi.org/10.1021/ci060164k" target="_window"&gt;article on solubility prediction &lt;/a&gt;that we mentioned in the previous post.  The cross-validation results for partial least squares (PLS), artifical neural net (ANN), support vector machine (SVM) and random forest (RF) models are given in Table 2.  The cross-validated RMSE is lowest for RF as is the RMSE for the external test set.  Random Forest is the best model!  Long live Random Forest!  It was validated so who are you, the uncouth authors of a blog that nobody reads, to question this finding?&lt;br /&gt;&lt;br /&gt;It is true that the readership of The Crapshoot could comfortably assemble in the ensuite portion of a budget London hotel room.  However, we really do object to being called uncouth and so we're going home (and taking our ball with us).  Our parting shot is that we've not quite used up all the ammo from that nice paper on overfitting...&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2009/01/islands-in-chemical-ocean.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-6625684465481508220?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/6625684465481508220/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=6625684465481508220' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6625684465481508220'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6625684465481508220'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2009/01/perils-of-overfitting.html' title='The perils of overfitting'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-1560002641516589026</id><published>2008-12-14T07:12:00.000-08:00</published><updated>2011-01-09T04:26:37.035-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rule of 5'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='update'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><category scheme='http://www.blogger.com/atom/ns#' term='hydrogen bonding'/><title type='text'>Only Connect</title><content type='html'>As service to our loyal readers, we have installed some forward links so that some the themes that we have explored can be re-visited in the sequences in which they appeared.  Here are our first posts on &lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5-warm-up.html" target="_window"&gt;Rule of 5&lt;/a&gt;, &lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-will-it-do-for-me.html" target="_window"&gt;hydrogen bonding &lt;/a&gt;and &lt;a href="http://gmc2007.blogspot.com/2008/11/models-predictive-or-otherwise.html" target="_window"&gt;predictive modelling &lt;/a&gt;so you can see how it all works.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-1560002641516589026?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/1560002641516589026/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=1560002641516589026' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1560002641516589026'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1560002641516589026'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/12/only-connect.html' title='Only Connect'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-3806301540915615102</id><published>2008-12-12T10:37:00.001-08:00</published><updated>2010-09-26T15:06:27.531-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>Where are the models?</title><content type='html'>So we left you hanging a bit in the &lt;a href="http://gmc2007.blogspot.com/2008/11/names-ockham-william-of-ockham.html" target="_window"&gt;previous post&lt;/a&gt; for which we apologise.  William of Ockham was about to do battle with Random Forest, armed only with what would appear to be a singularly inadequate razor.  We’ll have to apologise again because you’re going to have to wait a while longer for the final showdown.  We realise that many of our patient and loyal readers may not have encountered the sorts of predictive models that William of Ockham is licensed to invalidate and as a public service we’ll take a quick look at 3 publications.  Our objective in this post is not to review these models but merely to use them to show you why studies like these might be of interest to Mr Ockham.&lt;br /&gt;&lt;br /&gt;In the &lt;a href="http://www3.interscience.wiley.com/journal/93518775/abstract" target="_window"&gt;first article&lt;/a&gt;, industrial researchers present methods for predicting hERG liability in compound libraries using their own data which was not made available to readers or, presumably, the reviewers of this paper.  We extend special sympathy to the reviewers of this article because we just can’t tell whether the models described within are useful and highly predictive or of a value that is largely calorific.  This is a general theme which we will re-visit in future posts.   &lt;br /&gt;&lt;br /&gt;In the &lt;a href="http://dx.doi.org/10.1021/jm050200r" target="_window"&gt;second article&lt;/a&gt;, industrial researchers present methods for prediction of volume of distribution.  Volumes of distribution and calculated properties, although not the structures, for the training set compounds were shared as supplemental material.&lt;br /&gt;&lt;br /&gt;In the &lt;a href="http://dx.doi.org/10.1021/ci060164k" target="_window"&gt;third article&lt;/a&gt;, academic researchers present methods for predicting aqueous solubility.  Structures and measures solubility for training and test sets were shared as supplemental material. &lt;br /&gt;&lt;br /&gt;The authors of these articles share their data sets to varying degrees however none appear to be particularly forthcoming with the predictive models themselves.   The second article presents 31 parameter values for a multi-linear regression model in the supplemental material but the random forest remains an almost complete mystery.  Is it fair that a medicinal chemist needs to provide spectral data for new compounds while a predictive modeller can get away with root mean square error and and r-square?  Don’t ask us for we are simple folk and we just write The Crapshoot.&lt;br /&gt;&lt;br /&gt;So if you think that reading an article on predictive modelling of clearance,  volume, CYP inhibition, hERG blockade, solubility or plasma protein binding is going to provide you with a practical means to predict any of these quantities, you may wish to prepare yourselves for disappointment.&lt;br /&gt;&lt;br /&gt;In the next post, we’ll be taking a look at the ubiquitous problem of over-fitting.  William of Ockham is already sharpening his razor.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2009/01/perils-of-overfitting.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-3806301540915615102?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/3806301540915615102/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=3806301540915615102' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3806301540915615102'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3806301540915615102'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/12/where-are-models.html' title='Where are the models?'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-8551005601431671404</id><published>2008-11-21T15:56:00.000-08:00</published><updated>2010-09-26T15:06:27.534-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><title type='text'>The name’s Ockham, William of Ockham</title><content type='html'>“Ah, 007, we’ve been expecting you”&lt;br /&gt;&lt;br /&gt;William of Ockham turned slowly and looked in the direction of the all too familiar voice.&lt;br /&gt;&lt;br /&gt;“This is my new pet”, exclaimed the evil-looking individual, as he stroked the furry monstrosity on his lap. “Ugly brute isn’t it?  It’s a Zucker rat, bred for obesity rather than good looks.  I can think of many a Pharma organisation that would be better off if it were to replace its &lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt;Senior Pharma Fellows&lt;/a&gt; with some of these.  MI5 killed my cat, you know.”&lt;br /&gt;&lt;br /&gt;Ockham managed a wry smile.  Q had ingeniously packaged an equine-sized dose of tetrodotoxin into a feline-sized vitamin tablet. “A bad case of the Torsades, it would seem. I hope it didn’t hERG too much”.&lt;br /&gt;&lt;br /&gt;“Come now, 007, enough of this scurvy jest.  I can’t believe that you know that little about ion channels and the alkali metals.  But we have more important matters to discuss before I eliminate you once and for all.  I have the perfect plan and you and those meddlesome half-wits at naval intelligence will be powerless to stop me”.&lt;br /&gt;&lt;br /&gt;“We’ve heard that before, Blofeld.  Last time, weren’t you going to reverse the flow of The Gulf Stream using a Support Vector Machine?”&lt;br /&gt;&lt;br /&gt;“Touché, 007.  I have to admit that plan was pathologically flawed.  We failed to take account of the fact that the Support Vector Machine was so closely associated with that pseudo-scientific psychobabble called Bioinformatics that nobody took our threat seriously.  But it will be different this time.”&lt;br /&gt;&lt;br /&gt;“They all say that”, replied Ockham.  Every time they had introduced a new Leadership Paradigm at Naval Intelligence, the management consultants would insist that it would different from all the other times, that it wasn’t an initiative, &lt;em&gt;per absurdum, ad nauseum…&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;“This time, 007, it will truly be different.  Allow me to introduce you to Random Forest”.&lt;br /&gt;&lt;br /&gt;Ockham felt slightly apprehensive at the prospect of facing this unfamiliar new opponent, especially as he recalled the occasion when Q had insisted that he swap his trusty Beretta for a….&lt;br /&gt;&lt;br /&gt;“For fuck sakes, Q, what am I supposed to do with this stupid razor?”&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2008/12/where-are-models.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-8551005601431671404?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/8551005601431671404/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=8551005601431671404' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8551005601431671404'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8551005601431671404'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/11/names-ockham-william-of-ockham.html' title='The name’s Ockham, William of Ockham'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-2378742497362341677</id><published>2008-11-15T12:54:00.000-08:00</published><updated>2010-09-26T15:06:27.535-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='predictive modelling'/><title type='text'>Models, predictive or otherwise</title><content type='html'>Well it has been quite fun bashing those who scale scientific peaks that are all but inaccessible to The Great Unwashed.  We've all met Senior Pharma Fellow and experienced the shock and awe that one so ordinary and talentless could rise so far, so quickly.  Clearly Newtonian physics does not apply to the career trajectories of these individuals.  However it is time to let go of this entertaining topic because we have more pressing matters to discuss.&lt;br /&gt;&lt;br /&gt;In the next series of posts we'll take a look at some of the methodology used to build predictive models.  What sort of predictive models are we talking about?  If you've worked in medicinal chemistry, you'll have encountered these models.  You draw in a chemical structure and the model comes back with a predicted solubility or free fraction in plasma.  In some more draconian regimes, you'll not be allowed to synthesise the compound because the solubility is predicted to be unacceptable.  It gets like the science fiction movie in which people build robots and then become enslaved by them. Except this isn't a movie, it's the Pharmaceutical Industry of the late noughties and many of us are in it.&lt;br /&gt;&lt;br /&gt;Are the models any good?  Depends who you ask and when.  Some years ago I remember someone at work making the case for some tree-based pattern recognition software by noting that this methodology is used in the financial services industry to identify invididuals and organisations that it would be risky to lend to.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2008/11/names-ockham-william-of-ockham.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-2378742497362341677?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/2378742497362341677/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=2378742497362341677' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2378742497362341677'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2378742497362341677'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/11/models-predictive-or-otherwise.html' title='Models, predictive or otherwise'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-2844979842238222307</id><published>2008-11-09T11:01:00.000-08:00</published><updated>2010-09-26T15:06:27.538-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='organisational'/><category scheme='http://www.blogger.com/atom/ns#' term='pharma life'/><category scheme='http://www.blogger.com/atom/ns#' term='gossip'/><title type='text'>Scaling the Scientific Eigernordwand: Part 2</title><content type='html'>Preparing &lt;a href="http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html" target="_window"&gt;the previous post &lt;/a&gt;reminded us of a description of abseiling as the second fastest way down a mountain.  However, becoming a Senior Pharma Fellow is not the only way up.  Another way is to get in at the start.&lt;br /&gt;&lt;br /&gt;Or at least it used to be. This strategy only works in an expanding organisation so, given the global credit crunch and the current state of the pharmaceutical industry, it’s probably not something around which you want to plan your career.  Nevertheless there are plenty of Pharma people whose important-sounding job titles are largely consequences of having got in early.&lt;br /&gt;&lt;br /&gt;The transition from Scientist to Manager can prove difficult for some individuals.  The manager of a systems biology group who started as the only member of the group will typically feel a little possessive of the group.  Early on when there were only two other people in the group, this was not too much of a problem.   However, there are now 15 people in the group and a little problem has cropped up.   The Top Gun who joined the group following 3 years in the lab of a Massachusetts-based, Key Opinion Leader, funded by a prestigious fellowship, is preparing a manuscript for publication.  And this is causing a slight problem.&lt;br /&gt;&lt;br /&gt;Why should this be a problem?   Well, Top Gun doesn’t see why she should include Group Manager as an author since Group Manager has made no contribution to the this piece of work.  This is a real break from tradition because Group Manager is always listed as an author on anything that anyone in the group publishes regardless of whether he has actually contributed to the work in question.  It has always been this way ever since the group was just Group Manager and nobody else.  &lt;br /&gt;&lt;br /&gt;You’ll all have seen similar examples be they in systems biology, proteomics, informatics, molecular modeling or protein crystallography.  Then there's the Med Chem manager with a remarkable ability to appear on patents.  Be wary of  the offer to ‘take a look at your manuscript’ since the comments that come back may well include a couple of completely re-written paragraphs representing Group Manager’s ‘contribution’ to your article.  Frequently, Group Manager’s boss will just let him get away with it since to do otherwise would require a degree of moral fibre that is rarely encountered at that level of the organisation.  Sometimes Group Manager’s boss is actually in on the scam and is also getting on papers.  However, we should not be too hard on Group Manager because he really does think that he is equivalent to a professor in a university and that the members of the group are his post-docs.&lt;br /&gt;&lt;br /&gt;A similar situation can exist in small companies except that Group Manager may now be Chief Technical Officer (CTO).  In some cases the company grows too fast for CTO to maintain his grip on publishable material within the company. There will be pain all round when somebody makes CTO realise that it’s not totally reasonable to expect to be listed as an author of work that he had no awareness of.  So if you see CTO pulling up weeds in the car park, he’s probably just sulking because one of The Great Unwashed managed to slip something into a journal without him knowing.&lt;br /&gt;&lt;br /&gt;They are like monkeys.  The higher they climb, the more revolting are the parts that they display.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Any similarity between the characters in this Crapshoot and real people, living or dead, is entirely coincidental.  No animals, children, Senior Pharma Fellows, Group Managers, Key Opinion Leaders or Chief Technical Officers were harmed in the preparation of this Crapshoot.&lt;/em&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-2844979842238222307?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/2844979842238222307/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=2844979842238222307' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2844979842238222307'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2844979842238222307'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/11/scaling-scientific-eigernordwand-part-2.html' title='Scaling the Scientific Eigernordwand: Part 2'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-4114910244862874589</id><published>2008-10-19T04:18:00.000-07:00</published><updated>2010-09-26T15:06:27.548-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='organisational'/><category scheme='http://www.blogger.com/atom/ns#' term='pharma life'/><category scheme='http://www.blogger.com/atom/ns#' term='gossip'/><title type='text'>Scaling the Scientific Eigernordwand</title><content type='html'>Last week, eight of us were sitting around in a brew pub between two days of very tedious meetings.  Somebody mentioned the recent promotion of an individual to a very senior scientific position in The Company.  Three of the group were amazed that such an individual could had risen to such an exalted position while none of the remaining four members of the party had actually heard of the individual in question.&lt;br /&gt;&lt;br /&gt;If you’ve worked in a pharmaceutical company, you may have occasionally encountered individuals like the one we’re talking about.  They’ll have the ear of Upper Management and a grand job title that suggests that they are amongst The Company’s scientific elite.    Their scientific prowess will be trumpeted in The Company Pravda and there will be pictures and videos of them hobnobbing with Upper Management and important visitors like Key Opinion Leaders.  Upper Management will describe them as visionary, pre-eminent drug hunters and exceptional role models for The Great Unwashed who occupy the lower strata of the food chain. &lt;br /&gt;&lt;br /&gt;So how did this scientific elite come to be?  The quick answer is in a variety of ways. They may have got in at the start and as the company grew they just kept moving up.  Sometimes it’s simply a case of diversity-driven career progression although, thankfully, this is rare.  More likely, they got there because in Pharma, Upper Management need tame scientists to help them maintain the illusion that they care deeply about and are intimately involved in the science of drug discovery.&lt;br /&gt;&lt;br /&gt;The most important thing to remember about members of this elite group is that they are there because Upper Management says that they are the best scientists.  They have not got to the top by being negative and saying that drug discovery is a difficult and unpredictable business.  In time they become an integral part of the management structure.  Upper Management has a potential problem with how to distinguish these individuals from The Great Unwashed.  However, this problem is easily solved by noting that Upper Management are important people and that advising important people is important work.  Such important work that the people doing it need to be promoted to levels that are completely inaccessible to The Great Unwashed.&lt;br /&gt;&lt;br /&gt;The existence of this scientific elite makes for some interesting political activity within Big Pharma organizations.  Suppose you’re running an established pharmacokinetics department that is seen as less dynamic and energetic than the anti-viral task force set up a year ago.  One way to make your tired department look more dynamic and energetic is to find somebody that you can get away with promoting to Senior Pharma Fellow (SPF).  This is also a good time to call in some favours from your buddies in academia.  Can they create an important-sounding visiting professorship for your SPF at their institution?  Maybe invite him to give some lectures at their university?  And of course we’ll provide the funding for that post-doc that you’ve been gagging for…&lt;br /&gt;&lt;br /&gt;So you’ve wrong-footed your anti-viral opponents.  Well done!  Their task force may be bursting at the seams with bright, energetic, thrusting individuals who were the best post-docs in the best groups in the best universities.  The best and the brightest one might say but you’re the one with the SPF.  However, don’t stop now because there’s still work to do.  You need to get your SPF on to the Research Coordination Committee (RCC).  That sounds difficult!  How do we do that?  Trivial problem!  Just tell the chinless wonders who decide who sits on the RCC that he’s an SPF and it’ll happen.  They’ll have even less of a clue about your SPF’s scientific limitations than your anti-viral opponents.  If anybody asks awkward questions, just remind them that the appointment of an SPF is rubber-stamped (better to say ‘authorised’ though) at the highest level in the organisation.  Now there’s one more thing that you’ve got to do.  Get SPF onto an external committee where he’ll be able to review grant proposals and really put the screws on those tiresome academics.  Excellent work!  Now you don’t need to feel guilty about that conference in Cancun at which you’ll be doing an undemanding general overview of pharmacokinetics to help out your old university buddy who is chairing a session...&lt;br /&gt;&lt;br /&gt;Welcome back from Cancun.  We hope you had a great trip, free of Montezuma’s Revenge or the Inca Quickstep.  We don t want to alarm you but a couple of problems cropped up with SPF while you were away.  Turns out SPF decided to update the department on his global vision for the future of pharmacokinetics.   Trouble is that he got a bit carried away with his optimistic view of the state of the art of human dose prediction and some smart ass exposed a rather rudimentary understanding of allometric scaling… &lt;br /&gt;&lt;br /&gt;Shit!  &lt;br /&gt;&lt;br /&gt;And that’s not all.  Last week, he set up a meeting with the folk working on transporters, made a couple of vapid comments on their results and now thinks he should be on the paper that they’re writing with their academic collaborator.   &lt;br /&gt;&lt;br /&gt;Shit!  I go off to Cancun for two weeks and come back to this!  How could you let it happen?  &lt;br /&gt;&lt;br /&gt;Don’t blame us, we only write The Crapshoot.  And please don’t swear.  You'll have to talk to him yourself. Be firm like you’re house-training a puppy that keeps pissing on the floor.  Here’s what you’ve got to say:&lt;br /&gt;&lt;br /&gt;You imbecile!  I go to Cancun for two weeks and I come back to this!  I didn’t make you an SPF because you’re a good scientist.  I made you an SPF because this department is crap and all the other departments know we’re crap.  If we have our own SPF, the head of research and the other departments might think we’re less crap than they do now.  Your job is not to provide leadership for the scientists or anybody else in this department.  You are now in management although you will have no power.  Your job is to present anodyne graphics with lots of metrics to the fuckwits who have the power to cut my budget.  Just like how they do the weather on TV.  Why the sour face?  Try smiling instead of looking like you’re sucking a lemon.  Get a synchronised swimming DVD if you don’t know how and while you’re at it learn how to use an autocue!  Now I don’t want any of this visionary, pre-eminent bullshit going to your head.  We had to say it otherwise we couldn’t have justified your promotion to those meddlesome, opinionated cretins in Human Resources.  Also I don’t want you talking to our scientists any more and don’t even think of talking to those bastards in the anti-viral task force.  If you want to write papers, there are plenty of lightly-refereed journals that you can do reviews for.  Maybe we can even get you on the editorial board of one or perhaps a monthly column. &lt;br /&gt;&lt;br /&gt;That’s all for now, folks.  Please stay tuned for the next instalment. &lt;br /&gt;&lt;br /&gt;&lt;em&gt;Any similarity between the characters in this Crapshoot and real people, living or dead, is entirely coincidental.  No animals, children, managers or Senior Pharma Fellows were harmed in the preparation of this Crapshoot.&lt;/em&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-4114910244862874589?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/4114910244862874589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=4114910244862874589' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4114910244862874589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4114910244862874589'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/10/scaling-scientific-eigernordwand.html' title='Scaling the Scientific Eigernordwand'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-4585275491066685671</id><published>2008-09-30T12:48:00.000-07:00</published><updated>2010-09-26T15:06:27.568-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pharma life'/><title type='text'>Innovation deficit syndrome</title><content type='html'>It just doesn't get any easier in Pharma.  We can't find the new drugs quickly or cheaply enough for Our Leaders who think they're making cars. The regulatory path is a tangled web of tripwires and claymores that we have helped to construct. Darth Genericus is poised to exploit openings the moment we are no longer in a position of strength.  Innovation is the key but we seem to innovate less.  &lt;br /&gt;&lt;br /&gt;Our diagnosis is Innovation Deficit Syndrome which we propose calling IDS.  Interestingly these are the intials of a &lt;a href="http://en.wikipedia.org/wiki/Iain_Duncan_Smith" target="_window"&gt;former Leader&lt;/a&gt; of &lt;a href="http://en.wikipedia.org/wiki/Conservative_Party_(UK)" target="_window"&gt;Britain's Conservative Party &lt;/a&gt;and he was known by these intials.  It was said then that the initials referred to both Leader and Party.&lt;br /&gt;&lt;br /&gt;In Deep Shit.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-4585275491066685671?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/4585275491066685671/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=4585275491066685671' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4585275491066685671'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4585275491066685671'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/09/innovation-deficit-syndrome.html' title='Innovation deficit syndrome'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-4219179403449140373</id><published>2008-09-27T11:03:00.000-07:00</published><updated>2010-09-26T15:06:27.576-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='amusing or bizarre'/><title type='text'>Schadenfreude</title><content type='html'>Recently our day was greatly enriched by the observation of the rear half of an expensive sports car protruding from some bushes. The local topography of the road appeared unchallenging and no other vehicles appeared to have been involved.&lt;br /&gt;&lt;br /&gt;Vorsprung durch dummkopf.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-4219179403449140373?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/4219179403449140373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=4219179403449140373' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4219179403449140373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4219179403449140373'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/09/schadenfreude.html' title='Schadenfreude'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5228467705952296948</id><published>2008-09-24T14:47:00.000-07:00</published><updated>2010-09-26T15:07:40.467-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><title type='text'>It really is a crapshoot: Conclusion</title><content type='html'>So you were probably wondering what all that (&lt;a href="http://gmc2007.blogspot.com/2008/09/it-really-is-crapshoot-1.html" target="_window"&gt;1&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2008/09/it-really-is-crapshoot-2.html" target="_window"&gt;2&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2008/09/it-really-is-crapshoot-3.html" target="_window"&gt;3&lt;/a&gt;) was about.  The coin that lands heads up 55 percent of the time is clearly an abstraction.  It has been concocted to help our loyal readers think about the significance of significance.&lt;br /&gt;&lt;br /&gt;This coin is most definitely biased.  The problem is that you’re going to have to toss it quite a number of times before that bias reveals itself.  The coin is like a weak relationship between something unpleasant (like being promiscuous in the strictest pharmacological sense) and something you can measure (like lipophilicity).  Powered by enough data even the weakest trends become significant.&lt;br /&gt;&lt;br /&gt;To note the significance of a trend without showing the underlying variation in the trend is like simply noting that a coin is biased.  In this world all biased coins are equal.  These include our 55 percent coin and the one lands heads up 999 times out of 1000.  So think about this the next time that you’re &lt;a href="http://gmc2007.blogspot.com/2008/07/desperately-seeking-signifcance.html" target="_window"&gt;desperately seeking significance&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The 20-player game is a little more difficult to place in a drug discovery context but we won’t let that stop us.  We know our readers are depending on us.  Let’s return to that weak relationship between the unpleasant and the measurable.  Allowing your chemists to make more lipophilic compounds increases the risk of it going belly up later.  However, drug discovery is a risky business and the risk of choking in development is just one risk you face.   You can’t win by just not choking in development.  You need to get your drug market before your competitors or it needs to be better or cheaper than what has already got there.   The 20 player game tells us that the risks cannot be assessed in isolation.  The risks depend on what the other players are doing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5228467705952296948?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5228467705952296948/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5228467705952296948' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5228467705952296948'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5228467705952296948'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/09/it-really-is-crapshoot-conclusion.html' title='It really is a crapshoot: Conclusion'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-2470129554124655810</id><published>2008-09-14T14:44:00.000-07:00</published><updated>2010-09-26T15:07:40.470-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><title type='text'>It really is a crapshoot 3</title><content type='html'>Well it didn’t look like anybody wanted to join the fun. &lt;br /&gt;&lt;br /&gt;The 2-player game introduced in the &lt;a href="http://gmc2007.blogspot.com/2008/09/it-really-is-crapshoot-1.html" target="_window"&gt;first post of this series&lt;/a&gt; is relatively simple.  You and your opponent both know that the coin lands heads up 55 percent of the time.  If you both keep calling heads nobody will win.  If you know your opponent will call heads you can call tails for the first round and then stop.  You will have a 45% chance of winning compared to no chance of winning if you keep calling heads.   &lt;br /&gt;&lt;br /&gt;The 20-player game described in the &lt;a href="http://gmc2007.blogspot.com/2008/09/it-really-is-crapshoot-2.html" target="_window"&gt;second post of this series&lt;/a&gt; is much more interesting.  If the other 19 players all consistently call heads, calling tails makes a lot of sense.  Suppose the stake is $1.  You expect to lose 11 games out of 20 so for these you’re going to be $11 down.  However, you’ll win back $19 for each game that you win so you can expect to be $160 up over 20 games.  However, if that $1 represents everything you’ve got, there is a 55% chance that you’ll get wiped out before you can start winning.&lt;br /&gt;&lt;br /&gt;Things get much more complicated if the other players become less conservative and are prepared to bet on the less probable outcome.  Knowledge is power in this game.  Knowing exactly how biased the coin is and what your opponents are going to do will either lead you to a winning strategy or persuade you that this is a game that cannot be won. &lt;br /&gt;&lt;br /&gt;Are you sure that you want to be a tosser?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-2470129554124655810?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/2470129554124655810/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=2470129554124655810' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2470129554124655810'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2470129554124655810'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/09/it-really-is-crapshoot-3.html' title='It really is a crapshoot 3'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-438423856620349366</id><published>2008-09-11T11:44:00.000-07:00</published><updated>2010-09-26T15:07:40.473-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><title type='text'>It really is a crapshoot 2</title><content type='html'>Now let’s make the game more exciting.  We’re still tossing the &lt;a href="http://gmc2007.blogspot.com/2008/09/it-really-is-crapshoot-1.html" target="_window"&gt;coin that lands heads up 55 percent of the time &lt;/a&gt;but we’ve increased the number of players to 20.  As before, all players stake an equal amount.  Those who call correctly retain their stakes and get an equal share in the prize money which is funded equally by the losers.  NNT would accuse us of falling for the Ludic Fallacy and we would say that he may be right but we write The Crapshoot.  So what’s your strategy?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-438423856620349366?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/438423856620349366/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=438423856620349366' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/438423856620349366'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/438423856620349366'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/09/it-really-is-crapshoot-2.html' title='It really is a crapshoot 2'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5183060447890440625</id><published>2008-09-08T14:33:00.000-07:00</published><updated>2010-09-26T15:07:40.476-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><title type='text'>It really is a crapshoot 1</title><content type='html'>Our faithful readers will be pleased to learn that our last post didn’t earn us a one way ticket to the auto da fé.  Or even excommunication, although that doesn’t quite come with the bragging rights that it used to.  This will be a short post in which we introduce the 55 percent coin.&lt;br /&gt;&lt;br /&gt;“The 55 percent coin”, we hear you cry, “what possible relevance could that have to drug discovery?  Surely, M. le Crapshoot you have finally lost your marbles”.  Patience, dear readers, the 55 percent coin is merely a means to illustrate some of the things we’ve discussed in previous posts.&lt;br /&gt;&lt;br /&gt;The 55 percent coin has a 55 percent chance of landing heads up and a 45 percent chance of landing tails up.  The observant amongst you will have observed that this coin cannot land on its edge.  Carry on observing and you will go far.  Maybe you can even become a Leader (don’t worry if you’re not observant because they do leadership courses for wannabe leaders who lack powers of observation).&lt;br /&gt;&lt;br /&gt;Toss the coin 10 times and there will be little indication that the coin is not fair.  After 100 tosses, something will look a bit fishy.  After 1000 tosses in will be abundantly clear that all is not well and your colleagues may also call you a tosser.  Armed with that information, what do you call when the coin is tossed?  Heads of course!  But how sure are you that it’s really going to land heads up.&lt;br /&gt;&lt;br /&gt;Now suppose you are playing this game an opponent.  You each stake the same amount against your prediction (heads or tails) of the toss, the coin is tossed and the correct prediction beats the incorrect prediction.  The winner takes the loser’s stake and, in the event of a draw (two correct or two incorrect predictions), each player keeps his/her stake.  That concludes a round.  You may leave the game at the end of any round.&lt;br /&gt;&lt;br /&gt;So that’s the game and, just to make it fun, we’re going to play it with the 55 percent coin.  So what’s your winning strategy?  To make things interesting, consider the following two scenarios which we’ll call A and B:&lt;br /&gt;&lt;br /&gt;A: Both you and your opponent know that the coin lands heads up 55% of the time.&lt;br /&gt;&lt;br /&gt;B: Both you and your opponent know that the coin lands heads up 55% of the time but you also know that your opponent will always predict heads&lt;br /&gt;&lt;br /&gt;Let the games begin!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5183060447890440625?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5183060447890440625/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5183060447890440625' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5183060447890440625'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5183060447890440625'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/09/it-really-is-crapshoot-1.html' title='It really is a crapshoot 1'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-7306408592375476261</id><published>2008-08-04T14:16:00.000-07:00</published><updated>2010-09-26T15:07:40.478-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='opinion'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><title type='text'>The Pope, The Atheist and an Irishman called Dave</title><content type='html'>After looking at a few data sets, you’ll to appreciate the limitations imposed by only having two dimensions in the great data-analytic sandpit.  We have seen how being trapped in a two-dimensional world can cause the weak-willed and those lacking moral fibre to yield to Categorical Sin (&lt;a href="http://gmc2007.blogspot.com/2007/10/rotatable-bonds-2-sacred-cow-culled.html" target="_window"&gt;1&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2008/07/desperately-seeking-signifcance.html" target="_window"&gt;2&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Fortunately help is at hand and there are a number of ways to bring extra dimensions into what would appear to be a two dimensional space.  For example, the bubble plot allows you to encode a third dimension using the size of the markers in a scatter plot.  Add some color–coding of the markers and all of a sudden you’ve four dimensions.  Use different makers (e.g. crosses, circles and squares) and very soon you’ll be able to carve out your very own little piece of hyperspace.  &lt;br /&gt;&lt;br /&gt;Plotting your data is of course A Good Thing.  It’s a first step that must be taken before proceeding with statistical analysis.  Also creating a nice plot with lots of colors and shapes and all that stuff is quite fun and can make even the pathologically ungifted feel seriously clever.   Attractive to the opposite sex even.  &lt;br /&gt;&lt;br /&gt;Analysis is relatively easy when you’ve only got two variables.  You can calculate a correlation coefficient or use linear regression to fit the dependent variable to the independent variable.  If the plot suggests the relationship between the variables is not linear you can fit a curve.   The plot may suggest that the things (let’s assume they’re chemical structures) for which you’ve calculated the variables are falling into two clusters.  In that case you’re probably better off just trying to understand whether there is a structural basis for that clustering.&lt;br /&gt;&lt;br /&gt;Life gets a lot more difficult when your plot has extra dimensions squeezed in.  How should you analyze that oh-so-sexy bubble plot with the color-coded elliptical bubbles?  Well you might respond piously with the tried and tested, “Well, a picture tells a thousand words, you know”.  The problem is that data-analytic capability just hasn’t kept up with the graphics.   So just how interesting is that bubble plot?  Don’t ask us, we just write The Crapshoot.  So you might leap onto your high horse and mention that humans can see patterns that computers just can’t.  True enough, and that last comment tells us that it’s now time for The Pope and The Atheist to join us.&lt;br /&gt;&lt;br /&gt;So what does this all have to do with Popes and Atheists?  And who is this Dave chap?  &lt;br /&gt;&lt;br /&gt;Patience, loyal readers, all will be revealed.   The &lt;a href="http://en.wikipedia.org/wiki/Dave_Allen_%28comedian%29" target="_window"&gt;late Dave Allen &lt;/a&gt;was a fine comedian and we will try to do justice to one of his contributions despite having heard it only once 20 years ago.  The Master would have made this last 10 minutes and would have surely included his abbreviated index finger but we will be briefer.   We would welcome feedback from any of readers who may be more familiar with the joke than we are.&lt;br /&gt;&lt;br /&gt;The action starts with The Pope and The Atheist in conversation.  His Holiness has just returned from the Influencing Skills course that the Lean Six Sigma folk have specially customised to the Vatican specifications.  &lt;br /&gt;&lt;br /&gt;His Holiness says to The Atheist, “My Son, You are like a blind man, wearing a blindfold, in a totally dark room, looking for a black cat that isn’t there”&lt;br /&gt;&lt;br /&gt;“That is true, Your Holiness”, says The Atheist, “but you too are like a blind man, wearing a blindfold, in a totally dark room, looking for a black cat that isn’t there.  Only difference is you’ve found it”&lt;br /&gt;&lt;br /&gt;Meow!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-7306408592375476261?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/7306408592375476261/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=7306408592375476261' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7306408592375476261'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7306408592375476261'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/08/pope-atheist-and-irishman-called-dave.html' title='The Pope, The Atheist and an Irishman called Dave'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-6150110099932495189</id><published>2008-07-15T13:17:00.000-07:00</published><updated>2012-01-27T13:08:07.333-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='categorical sin'/><category scheme='http://www.blogger.com/atom/ns#' term='oral drugs'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='stamp collecting'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='gsk'/><category scheme='http://www.blogger.com/atom/ns#' term='jmc'/><title type='text'>Desperately seeking significance</title><content type='html'>It really has been a while since we posted so we start this post with a grovelling apology to our loyal and patient readers.  Cider consumption has increased sharply and we have struggled to generate and maintain motivation.  This is partly due the nature of the literature that posting on the column forces us to read and in fact we’d almost forgotten what we planned to post on.  &lt;br /&gt;&lt;br /&gt;Regular readers of this column will be well aware of what we mean by categorical sins.  The usual trick involves plotting average values of some property of interest (e.g. solubility, bioavailability or promiscuity) for each category. We think this is very naughty because variation is hidden.  An even more insidious practise involves transforming continuous variables like ClogP and molecular weight (MW) into categories such as MW between 300 and 500.  Why do we term this behaviour sinful?  We do this because the motivation for hiding variation is normally to make weak trends appear less so, in the process making the people presenting the trends look smarter and more cultured.   That is why we use the term sin.  We stress that it is not a consequence of the participation of Jesuits in our early education.&lt;br /&gt;&lt;br /&gt;A number of relevant publications have appeared since we first introduced our readership, which we estimate to number no more than 6 hardy souls, to the concept of categorical sin.  The &lt;a href="http://dx.doi.org/10.1021%2Fjm701122q" target="_window"&gt;publication that features in today’s Crapshoot &lt;/a&gt;claims to present simple, interpretable ADMET rules of thumb and it can be found in a journal with an enticingly high impact factor.  In fact one of our fellow bloggers has &lt;a href="http://ashutoshchemist.blogspot.com/2008/06/admetthespeedofthought.html" target="_window"&gt;even beaten us to it&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Let’s dive straight in!  We ask that you take a look at the top left plot (there are 6 of them) in Figure 2 which is labelled a.  Four categories of molecular weight have been defined with the boundaries at 300, 500 and 700 and mean values of log(Solubility) have been plotted for each  category.   Presented like this, the relationship between solubility and molecular weight looks strong.  However we’ve seen that &lt;a href="http://gmc2007.blogspot.com/2008/05/breaking-stone-in-changi.html" target="_window"&gt;plotting data in this manner can make a weak relationship appear to be stronger than it is&lt;/a&gt;.  At least r-square values were not quoted for this plot as was done in the other article.&lt;br /&gt;&lt;br /&gt;Our other criticism of this plot is that variation is hidden.  Wait a minute, we hear you cry, there are error bars.  Patience, dear readers, error bars are indeed present but the variation is still hidden.   How can that be and why do you persist in playing these silly games with us?&lt;br /&gt;&lt;br /&gt;Typically we calculate standard deviation when want to quantify variation.  There can be problems with this if the distribution suffers from excessive skewness, kurtosis or halitosis but the standard deviation is a good place to start.  If Figure 2b (solubility by charge type) had been presented showing the standard deviations for each category we would have been satisfied and the Crapshoot would have been focusing elsewhere.  However the error bars are not standard deviations.&lt;br /&gt;&lt;br /&gt;The error bars in these plots are actually confidence intervals for the mean.  Each of these confidence intervals is derived from the &lt;a href="http://en.wikipedia.org/wiki/Standard_error_%28statistics%29" target="_window"&gt;standard error &lt;/a&gt;in the mean which in turn is obtained by dividing the standard deviation by the square root of the number of observations.  Confidence intervals defined in this manner would normally be used to address questions like whether mean solubility is significantly different for cations and neutrals.  If you have enough data even very small differences in means will become significant and we’ll return to that point at the end of the post.&lt;br /&gt;&lt;br /&gt;Well that was all a bit of a mouthful, wasn’t it? Let’s take a look at Figure 3a (the top left plot in Figure) which shows mean values of log(Bioavailability) for four categories of MW (&lt;300, 300-500, 500-700, &gt;700).  You’ll notice that the error bars are greater for the &lt;300 and &gt;700 groups of compounds.  So this means that there is more variation in bioavailability for these groups of compounds, doesn’t it?  Well not exactly.  It could mean that there are fewer compounds in these groups than in the other groups.  The problem is that we don’t know how many compounds are in each group so we don’t know really know too much about the standard deviations for the 4 groups of compounds. And that is why we say that the variation is hidden.&lt;br /&gt;&lt;br /&gt;There are plots in this article for a number of properties relevant to drug discovery and it is not our intention review them all.  Compounds are either grouped according to MW as described above or according to charge type. The article claims to provide simple, interpretable ADMET rules of thumb but we were not sure how these plots should be used.  Do we expect a compound with 499 MW to be more similar in its properties to the compound of 301 MW than to a compound with 501 MW just because somebody has chosen to set boundaries at 300 and 500?  We were unclear where all this was heading until we got to section 2.6 (Rules of thumb for a given set molecular properties).  At that point it became clearer about where things were heading and we did rather wish we had gone somewhere else instead.&lt;br /&gt;&lt;br /&gt;A new categorisation scheme is introduced in section 2.6.  Compounds were categorised as desirable if MW is less than 400 and logP &lt; 4.  All other compounds were categorised as less-desirable.  This categorisation was overlaid onto the four charge types (neutral, anion, cation and zwitterion) to give a total of eight categories.  Hope you’re still with us because we must admit to having become a bit disorientated ourselves with all of this slicing and dicing of the data.  Now it’s time to bring it all together.  Analyses were performed for 13 properties relevant to ADMET by comparing mean values for each category with mean values for the full data sets.  Take a look at Table 3 if you want to see what these look like.  The comparisons are coded as higher, lower or average with respect to average for the full data set.&lt;br /&gt;&lt;br /&gt;The observant amongst you will be wondering how one might use the results of this analysis.  Let’s take a look at the very first row of Table 3 which corresponds to solubility for neutral compounds.  The desirable category is labelled ‘average’ and the less-desirable category is labelled ‘lower’. How should we use this information?   Presumably if our favourite compound has MW of 390 and is a bit less soluble than we would like, it would be extremely unwise to add a hydroxyl group because this will push MW over the edge.&lt;br /&gt;&lt;br /&gt;Let’s pause for a moment to think about why people dare to make compounds that are so offensive to the Molecular Weight Gestapo and ClogP Thought Police.  It isn’t exactly a secret that excessive lipophilicity and MW cause molecules to bind to proteins that you’d prefer they didn’t.  Generally people make these compounds in order to increase binding to the primary target.  This is usually overlooked when presenting analysis based on &lt;a href="http://gmc2007.blogspot.com/2008/05/breaking-stone-in-changi.html" target="_window"&gt;questionable (i.e. in which variation is hidden) plots of promiscuity against lipophilicity&lt;/a&gt;.  When you’re struggling to achieve potency, you really need to have something a little more concrete than a pious statement of the sinfulness of lipophilicity and MW. &lt;br /&gt;&lt;br /&gt;It’s now back to Table 3 which tells us that solublility of neutral compounds will change from ‘average’ to ‘lower’ if we let MW go over 400 or ClogP exceed 4.  Now the chemist probably has some idea of how much extra potency will result from that structural change that takes ClogP from 3.8 to 4.2.  So it’s now over the rules of thumb.  How much solubility are we going to lose?  No, you can’t phone a friend.  OK, just tell us how much lower than ‘average’ is ‘lower’?  We’re sorry but that just wasn’t the answer that we were looking for.&lt;br /&gt;&lt;br /&gt;We really owe it to our loyal readers to tell them how different ‘average’ and ‘lower’ really are.  For solubility of neutral molecules with MW &gt; 400 or ClogP &gt; 4, ‘lower’ simply means that the average solubility for this group of compounds is significantly lower than the average solubility for all the compounds measured.  The average solubility for the group of compounds (MW &lt; 400 and ClogP &lt; 4) does not differ significantly from the average for the full data set and these are labelled ‘average’.  However if you have enough data even small differences can become significant. &lt;br /&gt;&lt;br /&gt;Well it’s now pub time so that’s just about it from us in this post but before we go we’ll share a thought.  Did you know that the drug you’re taking daily holds the world record for number of volunteers in a clinical trial?  Your reaction is:&lt;br /&gt;&lt;br /&gt;a) Great!  It must be significantly better than placebo with that number of people.&lt;br /&gt;&lt;br /&gt;b) Oh shit!  It took THAT many people to see an effect.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-6150110099932495189?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/6150110099932495189/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=6150110099932495189' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6150110099932495189'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6150110099932495189'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/07/desperately-seeking-signifcance.html' title='Desperately seeking significance'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-7573259188510467106</id><published>2008-05-16T12:58:00.000-07:00</published><updated>2012-01-27T13:09:49.081-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nrdd'/><category scheme='http://www.blogger.com/atom/ns#' term='categorical sin'/><category scheme='http://www.blogger.com/atom/ns#' term='oral drugs'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='stamp collecting'/><category scheme='http://www.blogger.com/atom/ns#' term='az'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>Breaking stone in Changi</title><content type='html'>Once again we have let things slip and once again we crave the indulgence of our loyal readers.  It is well over 3 weeks since we last posted.  Today we’ll take a look at &lt;a href="http://dx.doi.org/10.1038/nrd2445" target="_window"&gt;a review&lt;/a&gt; on &lt;a href="http://en.wikipedia.org/wiki/Druglikeness" target="_window"&gt;druglikeness&lt;/a&gt; which appeared in a high impact journal.  Strictly it is the parent journal that has the high impact factor but we’re not going to get bogged down by that minor detail.  This article has already been &lt;a href="http://gmc2007.blogspot.com/2007/12/crapshoot-so-far.html" target="_window"&gt;cited here in connection with categorical sins&lt;/a&gt;.  Much (mainly white) noise about it has been made where we work.   &lt;br /&gt;&lt;br /&gt;Actually we’re not going to review the entire article.  Those of you who have nothing better to do than read this column will know that we are typically underwhelmed by publications on druglikeness and believe that as a concept it is rather over-rated.  Instead we will focus on one particular piece of data analysis that is described in the target publication.  Are we being lazy?  Read on and you can make your own minds up.  You are adults after all.&lt;br /&gt;&lt;br /&gt;We’d now like you to take a look at Figure 3a in the featured article.  The horizontal axis is ClogP and the vertical axis is promiscuity.  Promiscuity? Are the drugs getting up to something naughty about which we shouldn’t be writing in a family friendly blog?  Promiscuity in this plot is defined by the number of assays for which at least 30% inhibition is observed at a concentration of 10 micromolar.  The plot suggests a strong relationship between promiscuity and lipophilicity, doesn’t it?  Well that’s what the authors of the article want you to think but, as loyal and cultured readers of the Crapshoot, you really should know better by now.&lt;br /&gt;&lt;br /&gt;Now let’s take a closer look at Figure 3a.  First the horizontal axis is not ClogP but Median ClogP.  Where did the median come from?  A reasonable question and, if you’ll just let us continue, everything will become abundantly clear.  Well sort of abundantly clear. The authors appear to have computed the median ClogP for each value of promiscuity.  Why have they done this?  The quick answer this very reasonable question is go take a look at Box S3 in the supplementary information.&lt;br /&gt;&lt;br /&gt;The manner in which Figure 3a has been constructed gives it some rather unusual characteristics.  Most importantly each value of promiscuity is represented by a single point regardless of the number of drugs with that value of promiscuity.   This distorts the original data by emphasizing the tails of the distribution and we think it’s a rather naughty thing to do.  Plotting the data as the authors have done displays the underlying trend in the data while hiding the variation in ClogP for individual values of the promiscuity.  This makes the trend easier to see but prevents us from knowing how strong it really is. &lt;br /&gt;&lt;br /&gt;One common approach to quantifying the strength of the relationships between two properties is to fit one to the other using regression.  Typically one starts by assuming a linear relationship but other functional forms (e.g. polynomial) are used if the plot suggests non-linearity.  One measure of the quality of fit is the&lt;a href="http://en.wikipedia.org/wiki/Coefficient_of_determination" target="_window"&gt; r-squared&lt;/a&gt; which is the proportion of the variance in the dependent variable that is explained by the regression model.  The r-square ranges in value from 0 (no fit) to 1 (perfect fit). &lt;br /&gt;&lt;br /&gt;Now let’s go back to Figure 3.  It appears the authors have done the linear regression on the summary data shown in Figure 3a rather than the full set of original data.  They quote an r value of 0.83 which corresponds to an r-squared of 0.69.  It’s a good time to take another look at Box S3 in the supplementary material.  The data from which the summary shown in Figure 3a was generated is distributed between two plots, one for acids and bases and the other for neutrals, quaternary bases and zwitterions.  We were a little curious about how the ClogP values were derived for the quaternary bases and why the authors decided to group the charge types as they did.   However that is not a path that we wish to go down right now and we’ll not make further mention of these concerns.   The plots show that promiscuity will be low when ClogP is very low.  However maintaining potency when ClogP is that low is simply not going to be an option for many targets and you’re going to run into permeability problems if you drop ClogP too far.  The question we’d like to pose to you, our loyal readers, is whether you’d expect for an r-squared value of 0.69 for either of the two plots in Box S3.&lt;br /&gt;&lt;br /&gt;Let’s pause for a moment to review what we’ve learned.  Firstly, quote r rather than r-squared because the latter can never exceed the former and your less alert readers may not even notice.  Secondly, and more importantly, averaging (in this case taking the median) of one variable over the each of the categories of the other is likely to give you an optimistic view of the strength of the underlying relationship.  This is the basis of Categorical Sin and, to help convince you of the fundamental sinfulness of the analyzing data in this manner, consider the situation in which there are only two categories of promiscuity (yes or no).  Now suppose the median ClogP values are different for the two categories.  What do you expect r-squared to be?  Everybody get 1?  It really is an honor to write for such clever, cultured readers.&lt;br /&gt;&lt;br /&gt;Sadly this is sadly not the only example of Categorical Sin that we have encountered in the peer-reviewed literature (see &lt;a href="http://gmc2007.blogspot.com/2008/04/substituents-potencies-and-pinschers.html" target="_window"&gt;1&lt;/a&gt; , &lt;a href="http://gmc2007.blogspot.com/2007/10/rotatable-bonds-2-sacred-cow-culled.html" target="_window"&gt;2&lt;/a&gt;).  Why do the reviewers not pick these things up?  It is for journal editors to fret over and it would be grossly unfair to speculate about possible family connections with the unfortunate &lt;a href="http://en.wikipedia.org/wiki/Nick_Leeson" target="_window"&gt;rogue trader &lt;/a&gt;who famously lost his &lt;a href="http://en.wikipedia.org/wiki/Barings_Bank" target="_window"&gt;Barings&lt;/a&gt; in the &lt;a href="http://en.wikipedia.org/wiki/Singapore" target="_window"&gt;city state of Singapore&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-7573259188510467106?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/7573259188510467106/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=7573259188510467106' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7573259188510467106'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7573259188510467106'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/05/breaking-stone-in-changi.html' title='Breaking stone in Changi'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-6263581778069987836</id><published>2008-04-20T01:37:00.000-07:00</published><updated>2010-09-26T15:07:40.528-07:00</updated><title type='text'>A year of The Crapshoot</title><content type='html'>It is now exactly one year since The Crapshoot made its first appearance.  As is customary on these occasions, we would like to thank all our readers, especially those who have commented on posts.  We are especially grateful to the authors of articles that have featured in our literature reviews and hope that the occasional less than flattering commentary has not been taken personally.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-6263581778069987836?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/6263581778069987836/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=6263581778069987836' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6263581778069987836'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6263581778069987836'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/04/year-of-crapshoot.html' title='A year of The Crapshoot'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-574863742593567993</id><published>2008-04-19T16:10:00.000-07:00</published><updated>2010-09-26T15:07:40.532-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='stamp collecting'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>Substituents, Potencies and Pinschers</title><content type='html'>So hopefully the suspense has built from our &lt;a href="http://gmc2007.blogspot.com/2008/04/performance-metrics-for-substituents.html" target="_window"&gt;previous post on the effects of common chemical subsituents on ligand potency&lt;/a&gt;.  Some of our Loyal Readers will have been annoyed to have been dropped just as it was getting interesting and we can only offer our most abject and grovelling apologies.  The suspense had to be built but, much more importantly, we had to go to the pub.  They serve a particularly tasty cider there.  It’s cloudy, quite strong and, on a bad night, really makes your tongue tingle.  If you die from drinking it, it is improbable that your remains will require embalming.&lt;br /&gt;&lt;br /&gt;We are sorry that so much time has passed since the previous post.  Here’s the &lt;a href="http://dx.doi.org/10.1021/jm070838y" target="_window"&gt;link to the target article&lt;/a&gt;.  We will continue to focus on Table 1.&lt;br /&gt;&lt;br /&gt;We have already discussed the categorical sins committed in slicing the tails off distributions to create the F(-1) and F(1) descriptors and you don’t need to be a &lt;a href="http://en.wikipedia.org/wiki/Doberman_Pinscher" target="_window"&gt;Doberman Pinscher &lt;/a&gt;to appreciate the fundamental immorality of these actions.  Nevertheless, slicing distributions is a commonly encountered data-analytic technique in drug discovery research although it tends to be less commonly encountered in statistical textbooks. One recurring concern we have with this data-analytic genre is that the slice points can be chosen to strengthen the conclusion that the investigators would like to draw.  Our challenge to the distribution slicers is to demonstrate that the results of their analyses are relatively insensitive to how the slicing is done.  Or perhaps consider methods to compare the distributions that adequately account for their continuous nature.&lt;br /&gt;&lt;br /&gt;However it is not just the distribution slicing that disturbs our digestion.  In order to make the next point we now ask that you ignore the previous paragraph and assume that the binary categorisation by distribution slicing is actually correct.  ‘Why do you play these games with our minds?’ we hear you cry and we simply ask that you make the assumption regardless of how absurd it may seem to you (and us) right now.  We ask because there are still a number of outstanding issues with the analysis presented in Table 1 of the featured article and it’s just easier to deal with these if you’re not distracted by whether the categorisation is indeed sinful.&lt;br /&gt;&lt;br /&gt;In the interests of time we’ll skirt over the arbitrariness of the choice of methyl as the substituent with which distributions for other substituents are compared.  We will also skirt over why one would want to compare the effects of substituents with any particular substituent given that these effects have already been defined with respect to hydrogen.   If people choose to compare their substituent effects with methyl (or any of the other 52 substituents in Table 1) then it is really not for us to say.  The Crapshoot is a liberal, pro-choice sort of column and we believe that Our Loyal Readers are sufficiently mature to take responsibility for those choices that they make.&lt;br /&gt;&lt;br /&gt; More serious is the manner in which the casual reader might think that all the distributions that are significantly different from methyl are indeed significantly different from this substituent. A contingency table analysis provides a probability that the observed effect could have been observed by chance alone.  The lower the probability, the greater the significance.   This is the way of The Statistician.  Take another look at Table 1 in the featured article.  The entry for F in the eighth column (*) tells us that the fraction of chlorine substitutions (0.064) that lead to an at least 10-fold potency increase and the corresponding figure for methyl (0.053) are significantly different with an associated probability of less than 0.05.  This means we are at least 95% sure that the distributions for methyl and chloro are different although it doesn’t mean that we necessarily care. Now let’s suppose we perform two contingency table analyses to compare the effects of substituents X and Y with methyl and get 95% in each case. Does that mean that we are 95% sure that X significantly different from methyl and that Y is significantly different from methyl.  Well not exactly.  If you want to consider both the substituents, you need to multiply the probabilities (95% x 95% = 90%).  If you consider more substituents the problem only gets worse.    We hope you’re still with us and apologise profusely for letting things get so tediously technical.  We are forced to admit that we’re still no closer to figuring out how, why or whether we should be using the results in Table 1 of the featured article.  Please let us know if you are.  &lt;br /&gt;&lt;br /&gt;Sorry that it all turned into a bit of a slog but we really must move on.  You will recall that the data for the analysis has been aggregated across up to 30 assays.  The nature of molecular recognition is that sometimes a substituent will increase potency, sometimes it will decrease it and sometimes it will have no effect at all.  Medicinal chemists are most interested in the first case where the substituent increases potency.  The second situation is still relevant if you can think of a suitable ‘anti-substituent’.  For example if you find that putting methyl on an aromatic carbon costs a lot of potency you might try replacing that carbon with nitrogen in case there is a hydrogen bond donor in the binding site whose solvation has been compromised by having a methyl group thrust at it.  Probably a bit of long shot (we’re assuming no protein structure is available) but we think it’d still be a better bet than ethyl, butyl or futyl.  However when you average over both chemotype and assay, you are unnecessarily adding noise to your signal.  In this case, do you really expect to end up with anything other than the unremarkable and underwhelming Table 1?&lt;br /&gt;&lt;br /&gt;There is another yet another problem.   The dynamic range of assays is limited.  If you have a substituent that tends to have a dramatic effect on potency then it will be less likely that you’ll be able to measure potency for both parent and substituted analog.  Let’s take a look at the F(-1) and F(1) values for carboxylate that are given in Table 1.  The value of F(1) value is 0.247 which tells us that a quarter of the time adding a carboxylate to an aromatic ring leads to at least a log unit drop in potency.  This is not surprising given that a carboxylate is not a gift that you really want to offer to a protein unless you’re sure that it will be properly appreciated.  The value of F(-1) is 0.056 which is very similar to the corresponding figure (0.053) for methyl.  Now let’s assume that we have a situation in which the carboxylate is an essential part of the pharmacophore. The question you really need to ask yourselves is how confident are you that you can measure potencies for both the parent compound and the analog when your substituent is carboxylate.  The next question is, knowing what you do about molecular recognition, would you be more or less optimistic about being able to measure the effect of methyl substitution on potency? &lt;br /&gt;&lt;br /&gt;Now it’s time to get back to the tails.  These are the probably the most interesting regions within the distributions because they provide information about the best (and worst) we can expect to achieve by making a substitution.  Unfortunately the authors decided to trim the distributions prior to data analysis by removing potency changes that exceeded 4 standard deviations.  Think of all the poorly understood molecular recognition phenomena (topographically-focussed hydrophobic enclosure, electrostatically-enhanced conformational locking, hyperpolarised charge-octupole interactions, hyperconjugation-relayed field gradients) that might be lurking in those discarded tails.  What if some of these discarded results could have been interpreted in terms of structure of the target proteins?&lt;br /&gt;&lt;br /&gt;So there you have it.  The effects of on potency of a number of common substituents and we never even got beyond Table 1.  Essential information for drug discovery or philatelic use of the pages of a high impact journal? We are simple folk and we leave it to Our Loyal Readers to decide. &lt;br /&gt;&lt;br /&gt;Thankfully, we have only rarely encountered Doberman Pinschers. Our limited experience of this unsavoury breed suggests that they typically throw the best bits away when tail and Pinscher are separated.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-574863742593567993?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/574863742593567993/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=574863742593567993' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/574863742593567993'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/574863742593567993'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/04/substituents-potencies-and-pinschers.html' title='Substituents, Potencies and Pinschers'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-6927007494373830056</id><published>2008-04-07T15:30:00.000-07:00</published><updated>2010-09-26T15:07:40.535-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='stamp collecting'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>Performance metrics for substituents</title><content type='html'>The essence of molecular design is being able to predict what effects structural modifications will have on the molecular properties that you’re interested in.   It’s obviously great if you can actually predict the properties themselves but predicting changes in properties may be easier and you’ve always got the option to perform some measurements on the starting points for the optimisation process.  During our recent entanglement with hydrogen bonds, an &lt;a href="http://dx.doi.org/10.1021/jm070838y" target="_window"&gt;article with a promising title&lt;/a&gt; appeared in a reasonably well-known journal with an impact factor fully worthy of the attentions of The Crapshoot.   Eagerly we read on, pausing briefly to ponder the relevance of &lt;a href="http://dx.doi.org/10.1021/ol0358915" target="_window"&gt;reference 2 &lt;/a&gt;that the authors had apparently selected at random for citation.&lt;br /&gt;&lt;br /&gt;When you’re quantifying the effects of structural changes on properties you first need to define the changes.  For aromatic rings, hydrogen is the obvious reference substituent.  So if you want to find out what a methyl group does for potency then you need to find all the molecular pairs in your potency database in which one molecule has a methyl and the other is identical except that hydrogen replaces methyl.  This is pretty much the approach of the authors of our featured article.  Let’s take a look at their results for acyclic substituents on aromatic rings which you’ll find in Table 1 of the article.&lt;br /&gt;&lt;br /&gt;Potency is the focus of the article and it’s common to use pIC50 (-log of IC50) when comparing potencies.    The effect of a methyl group on potency is given by:&lt;br /&gt;&lt;br /&gt;pIC50 (X=Methyl) – pIC50 (X=Hydrogen)&lt;br /&gt;&lt;br /&gt;Once you’ve identified a number of these pairs, you can do all the normal statistical stuff like calculating means and standard deviations and that’s what the authors did.  They also aggregated the results for a number of different assays so the effects of structural changes are averaged over up to 30 different assays.  Hope you’re all still with us!  Now let’s take a look at what they found.&lt;br /&gt; &lt;br /&gt;The mean effect on potency of ranged from -0.261 to 0.498 .and the standard deviation ranged from 0.518 to 1.186.  This is exactly the sort of result that you’d expect because the averaging has been performed over multiple assays and multiple chemotypes.  We looked at the means and standard deviations in Table 1 and wondered how we might exploit them in molecular design.  We are still wondering but of course we are but simple folk.&lt;br /&gt;&lt;br /&gt;The authors of the featured study must have been thinking along similar lines.  The collection of means and standard deviations is of a distinctly philatelic aspect.  Not really the sort of thing that you can present to a journal of high impact factor as being at the cutting edge.   What is one to do in situations like this?  The answer is to present more statistics and that’s exactly what the authors did.  And here’s where it gets complicated so please pay close attention as we try to guide you through the minefield.  &lt;br /&gt;&lt;br /&gt;They defined two descriptors for the distribution associated with each substitution.  F(-1) is the probability of increasing potency by one log unit and F(1) is the probability of decreasing potency by one log unit.  The sign convention reflects the authors’ use of logIC50 rather than pIC50 but this is really not a problem.   Each of these descriptors partitions each data set into two groups thus providing access to that most famous last refuge of the scoundrel:  the &lt;a href="http://en.wikipedia.org/wiki/Contingency_table" target="_window"&gt;contingency table&lt;/a&gt;.    &lt;br /&gt;&lt;br /&gt;Contingency tables are normally used to analyse categorical data.  For example, you have some dead smokers and some equally dead non-smokers who have died of lung cancer or something else equally deadly.  Analysis of the contingency table tells you whether more smokers than non-smokers have died of lung cancer and how significant it is.  Significance of course is not especially significant for these smokers and non-smokers because they’re all dead so it’s probably a good time to refer you to a couple of our earlier posts (&lt;a href="http://gmc2007.blogspot.com/2007/10/rotatable-bonds-1-categorical-sins.html" target="_window"&gt;1&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2007/10/rotatable-bonds-2-sacred-cow-culled.html" target="_window"&gt;2&lt;/a&gt;) on categorical sins. &lt;br /&gt;&lt;br /&gt;Now back to the substituents.  The authors decided that methyl would be a good reference substituent and did contingency analysis for each substituent with respect to methyl.  This is how it works for the F(-1) descriptor and fluoro:&lt;br /&gt;&lt;br /&gt;Category 1: Methyl or fluoro&lt;br /&gt;Category 2: Change in logIC50 &lt;= -1 or change in logIC50 &gt; -1&lt;br /&gt;&lt;br /&gt;We’re happy with methyl or fluoro as a category just as were happy with smoking/not smoking and lung cancer/other cause of death as categories.  We are rather less happy about slicing up a continuous distribution as a way to define categories.   We also worry that the choice of methyl as a reference substituent is a little arbitrary when you’ve got an entire deck of cards to choose from.&lt;br /&gt;&lt;br /&gt;We will elaborate on these and (a number of) other concerns in the next post. We think it's great fun to split the material up like this because it helps build the suspense.  Until then we offer our most effusive and unctuous thanks to all our readers, especially in the state of Illinois, for reading The Great Molecular Crapshoot.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-6927007494373830056?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/6927007494373830056/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=6927007494373830056' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6927007494373830056'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6927007494373830056'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/04/performance-metrics-for-substituents.html' title='Performance metrics for substituents'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-6986528598108670673</id><published>2008-03-30T13:43:00.000-07:00</published><updated>2010-09-26T15:07:40.561-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='opinion'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>Blogging the literature</title><content type='html'>As our Loyal Readers will be aware, we devote a significant proportion of Crapshoot posts to placing peer-reviewed literature in the cross-hairs.  Some have said that literature posts are difficult and, on the evidence of our less than prolific output and the turgidity of the resulting posts, we would have to agree.&lt;br /&gt;&lt;br /&gt;There are a number of approaches to doing literature posts and these vary a great deal in the demands made of the blogger.  The trivial literature post involves a link to an article with no more comment than 'here is a good paper'.  The next level up is to summarise the article without adding any original insight or to bring some related articles together.  Depending on expertise and experience, these posts can be put together fairly quickly.  The two (&lt;a href="http://gmc2007.blogspot.com/2007/11/cambridge-one-gothenburg-nil.html" target="_window"&gt;1&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2007/11/misadventures-in-reciprocal-space.html" target="_window"&gt;2&lt;/a&gt;) appearances of the Red/Blue teams will give you an idea of what we're talking about here  &lt;br /&gt;&lt;br /&gt;However if you want to present a serious challenge to a published article, you'll need to put in some time.  Remember that it'll have got past two or three reviewers even though this will not always be apparent.  Most of the literature posts in The Crapshoot attempt to identify weaknesses in published articles and we are particularly motivated by a high journal impact factor and a large number of citations.  Our commentary on the proposed &lt;a href="http://gmc2007.blogspot.com/2007/10/rotatable-bonds-2-sacred-cow-culled.html" target="_window"&gt;link between rotatable bonds and oral bioavailability&lt;/a&gt; is a good example of this type of post. &lt;br /&gt;&lt;br /&gt;Much more difficult than identifying weaknesses in published literature is building on previous ideas and demonstrating their relevance in a different context.  We would dearly love to do this in every single literature post and if you can do this consistently you really should be writing for a review journal.  The closest we believe we got to achieving this was in &lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5.html" target="_window"&gt;the rule of 5&lt;/a&gt; and &lt;a href="http://gmc2007.blogspot.com/2008/01/molecules-for-simpletons.html" target="_window"&gt;molecules for simpletons &lt;/a&gt;posts.  But even there we fell well short of the ideal. &lt;br /&gt;&lt;br /&gt;So why do we bother with literature posts?  Some find this activity helps them gain a better understanding of the literature.  But this is not our motivation.  We post because much in the literature is accepted within Pharma as absolute fact.  Sheep or lemmings?  We leave it to you to decide.&lt;br /&gt;&lt;br /&gt;But enough of these musings because we have much bigger fish to fry.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-6986528598108670673?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/6986528598108670673/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=6986528598108670673' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6986528598108670673'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6986528598108670673'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/03/blogging-literature.html' title='Blogging the literature'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-1017216700689552861</id><published>2008-03-26T14:32:00.000-07:00</published><updated>2010-09-26T15:07:40.563-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='update'/><title type='text'>Blogroll purge</title><content type='html'>It's Night Of The Long Knives at The Crapshoot as we purge The Blogroll.  Most of the purgees have not posted for a while and many of our loyal Readers will scarcely notice that we have done anything at all.  &lt;br /&gt;&lt;br /&gt;However we have also decided to de-Blogroll the &lt;a href="http://blogs.nature.com/thescepticalchymist/" target="_window"&gt;Sceptical Chymist&lt;/a&gt;.  Spare a thought for the folk in publishing who put out blogs like these.  First, somebody else is likely to have decided that you're going to blog.  Then you can't say anything negative about anything that gets written in any of your journals because it would imply that one of your colleagues wasn't doing his/her job properly.  Next you can't say anything negative about anything that gets written in somebody else's journals because two (or more!) can play the game of escalatio and having gotten into a spat with your opposite numbers at a journal with lower impact factor is not going to help at annual review time.&lt;br /&gt;&lt;br /&gt;We accept it's going to be difficult for a publisher blog to be as sceptical as &lt;a href="http://en.wikipedia.org/wiki/Robert_Boyle" target="_window"&gt;Robert Boyle&lt;/a&gt;.  We have done our best to help and on one occasion triggered this &lt;a href="http://blogs.nature.com/thescepticalchymist/2007/07/i_cant_live_without_my_radiofr.html " target="_window"&gt;interesting exchange&lt;/a&gt;. We did not expect any thanks for injecting some fizz into an anodyne literature posting and this was just as well because we didn't get any.  Much more deserving of thanks was one of the authors of the featured literature who took the time to respond to our anonymous comments.  We would like to see more of this sort of thing and perhaps if journals provided good facilities to comment on published articles we might do so in a less anonymous manner.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-1017216700689552861?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/1017216700689552861/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=1017216700689552861' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1017216700689552861'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1017216700689552861'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/03/blogroll-purge.html' title='Blogroll purge'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-3867507898193812729</id><published>2008-03-09T12:58:00.000-07:00</published><updated>2010-09-26T15:07:40.569-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='molecular recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='hydrogen bonding'/><title type='text'>Neutral-neutral hydrogen bonds: The verdict</title><content type='html'>Well we do seem to have let things slip a bit.  Truth be told, we’re getting a little sick of hydrogen bonds and are longing to get back to important things like &lt;a href="http://en.wikipedia.org/wiki/Philately" target="_window"&gt;philately&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Druglikeness" target="_window"&gt;druglikeness&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Lean_Six_Sigma" target="_window"&gt;lean six sigma&lt;/a&gt;.  However we do take our responsibilities to our loyal readers seriously and will finish that with which we have tasked ourselves.  For those of you who’ve only just joined us, this is the conclusion of a series (&lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-will-it-do-for-me.html" target="_window"&gt;1&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;2&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2008/01/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;3&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2008/01/molecules-for-simpletons.html" target="_window"&gt;4&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2008/02/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;5&lt;/a&gt;) of Crapshoots that has focussed on the assertion that a neutral-neutral hydrogen bond will contribute no more 1.5kcal/mol to the stability of a protein ligand complex.  What are the precise origins of this figure and how did it come to be asserted quite so confidently?   These questions are actually as much about the functioning of the scientific establishment as they are about hydrogen bonding and this is the real reason for our interest in this work.&lt;br /&gt;&lt;br /&gt;The figure of 1.5kcal/mol made its &lt;a href="http://dx.doi.org/10.1038/314235a0" target="_window"&gt;debut in 1985&lt;/a&gt;.  It was based on 3 neutral-neutral hydrogen bonds, all of which involve a hydroxyl group either on ligand or protein (&lt;a href="http://gmc2007.blogspot.com/2008/02/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;see Crapshoot&lt;/a&gt;).  We were unconvinced that the contribution of any of these hydrogen bonds represented the maximum that we might expect for a neutral-neutral hydrogen bond.  Hydroxyl groups are one reason and the large number of hydrogen bonds between protein and ligand is another.  Now take a look at the last section of the article entitled ‘Biological specificity’.  Are the authors just writing about their protein ligand complex or do they claim a more general scope for their findings?  They refer to ‘the enzyme’ but ‘a substrate’ so we have to admit we just don’t know.   We can’t believe that anyone would seriously claim to have found the upper limit in a sample of just three so we’ll assume that it’s just a matter of wording.&lt;br /&gt;&lt;br /&gt;A year later the figure of 1.5kcal/mol &lt;a href="http://dx.doi.org/10.1021/bi00368a028" target="_window"&gt;cropped up again&lt;/a&gt;.  This time the hydrogen bonding groups were deleted from ligand rather than the protein and coincidentally hydroxyl groups were involved in all cases (&lt;a href="http://gmc2007.blogspot.com/2008/01/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;see Crapshoot&lt;/a&gt;).   The authors suggested that their results were consistent with the reported figure of 1.5kcal/mol and we do not believe than they claimed generality for their findings.  Once again we would not expect the contribution of any of these hydrogen bonds to be at the upper limit for a neutral-neutral hydrogen bond because there are too many of them (&lt;a href="http://gmc2007.blogspot.com/2008/01/molecules-for-simpletons.html" target="_window"&gt;see molecular complexity argument&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;Then the &lt;a href="http://www.pnas.org/cgi/content/abstract/90/4/1172" target="_window"&gt;amide-amide hydrogen bond study &lt;/a&gt;appeared in 1993 (&lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;see Crapshoot&lt;/a&gt;).  The amides have a similar problem to the hydroxyls in that deploying NH as a donor may compromise the solvation of the carbonyl oxygen acceptor.  We identified a couple of issues with the analysis and do not believe that these measurements support the adoption of 1.5kcal/mol as an upper limit for the contribution of a neutral-neutral hydrogen bond.&lt;br /&gt;&lt;br /&gt;While the articles featured represent valuable contributions to the literature, the number of hydrogen bonds sampled is low and the variety narrow.  Far too low and narrow for us to be confident that the maximally contributing hydrogen bond was represented in the sample.  Yet &lt;a href="http://www3.interscience.wiley.com/cgi-bin/abstract/55000580/ABSTRACT" target="_window"&gt;the assertion was made that 1.5kcal/mol was as good as it would get&lt;/a&gt;.  Why was this?  Did the makers of that assertion think that the results of the primary literature were of broader scope than they actually were?  Did they consider how representative these hydrogen bonds were for drug-protein complexes?  Did 1.5kcal/mol sit more comfortably in their review than say 2.5kcal/mol?  Did they get spooked by the impact factor of the journal in which the figure of 1.5kcal/mol first appeared?&lt;br /&gt; &lt;br /&gt;Lots of unanswered questions but here’s an interesting thought experiment.  Imagine you have a protein and you convert an amide NH into an ester oxygen.  Suppose that the amide NH functions as a hydrogen bond donor and you believe that a hydrogen bond contributes no more than 1.5kcal/mol.  Then ester should be no more than 1.5kcal/mol less stable than the wild type protein.  Wait, we hear you cry, the ester carbonyl oxygen will be a weaker acceptor than amide oxygen.  Fair point, we respond, so we’ll let you set the hydrogen bond involving the amide oxygen to the maximum of 1.5kcal/mol and that involving the ester oxygen to zero which is actually a very big concession.  So hopefully you’ll agree that converting an amide in a protein to an ester is not going to destabilise the protein by more than 3kcal/mol.&lt;br /&gt;&lt;br /&gt;Now, as you’ll have guessed, this isn’t just a thought experiment. Folk have actually &lt;a href="http://dx.doi.org/10.1016/S0065-3233(05)72002-7" target="_window"&gt;mutated amides into esters and measured effects on protein stability &lt;/a&gt;(see Table 2).  Now a quick scan of the relevant table will show that this mutation reduces protein stability by over 3kcal/mol in a number of cases, the largest figure being 4.8kcal/mol. Even the most innumerate will concede this is a little larger than 3kcal/mol.  There is of course a small detail that we haven’t mentioned and if nobody comments on it we’ll leave it at that.&lt;br /&gt;&lt;br /&gt;This concludes our long and tortured look at neutral-neutral hydrogen bonds.  We have traced the figure of 1.5kcal/mol from debut in 1985 to being quoted as an upper limit for all neutral-neutral hydrogen bonds.  We hope that you will now take a closer look at what lies beneath when numbers like these get presented as facts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-3867507898193812729?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/3867507898193812729/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=3867507898193812729' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3867507898193812729'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3867507898193812729'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/03/neutral-neutral-hydrogen-bonds-verdict.html' title='Neutral-neutral hydrogen bonds: The verdict'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5975501108861308216</id><published>2008-02-20T14:22:00.000-08:00</published><updated>2010-09-26T15:07:40.572-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='molecular recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='hydrogen bonding'/><title type='text'>A hydrogen bond: What did it do for them? Part 3</title><content type='html'>We turn now to the &lt;a href="http://www.nature.com/nature/journal/v314/n6008/abs/314235a0.html" target="_window"&gt;remaining article &lt;/a&gt;cited in support of the &lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-will-it-do-for-me.html" target="_window"&gt;assertion that a neutral-neutral hydrogen bond will contribute no more that 1.5kcal/mol to binding affinity&lt;/a&gt;. We have not found the evidence (see parts &lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;1&lt;/a&gt; and &lt;a href="http://gmc2007.blogspot.com/2008/01/hydrogen-bond-what-did-it-do-for-them.html" target="_window" &gt;2&lt;/a&gt;) presented so far in support of this claim to be overly convincing so we're desperately hoping that this article will clear the rather muddy waters. The article was in fact the first one to be cited and pre-dates the other two. For those of our loyal readers who care about such trivia, the journal has a high impact factor. Why have we left it until last to review this article? We have our reasons which may become apparent to some of you.&lt;br /&gt;&lt;br /&gt;Before we get stuck into the business at hand, let's take a quick look at what we've learned so far. First, if you're going to claim that you've established an upper limit for the contribution of a hydrogen bond you do need to demonstrate that your hydrogen bonds are optimal in terms of geometry and solvent exposure.  We also learned that you can't really take the contributions of one type of acceptor (e.g amide oxygen or hydroxyl) and extrapolate them to other types of acceptor (e.g. aromatic nitrogen).  Lastly, the contribution of a hydrogen bond may well depend on the number of other intermolecular hydrogen bonds between ligand and protein.&lt;br /&gt;&lt;br /&gt;Delighted that you've taken all this in because it's time to take a look at the featured article.  This is a well known, heavily-cited publication that describes the use of protein engineering to analyse hydrogen bonding and biological specificity.  The enzyme is tyrosyl-tRNA synthetase and we should point out at the outset that it's a nice paper.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/_9pt-rDMtsM4/R7ImA2pNWDI/AAAAAAAAAA0/EaJ3GBotC-4/s1600-h/image002.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_9pt-rDMtsM4/R7ImA2pNWDI/AAAAAAAAAA0/EaJ3GBotC-4/s400/image002.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5166233518657591346" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The starting point for the analysis is a a crystal structure (see fig 1 in the article) of tyrosyl adenylate (see above) bound to the enzyme.  Amino acids whose side chains are observed to form hydrogen bonds with the ligand are systematically modified (site-directed mutagenesis) and the effects of these mutations are quantified by comparing kcat/Km values with that for the wild type enzyme. There are 11 hydrogen bonds between ligand and protein.  Five of these can be counted as having a charged partner either in the ligand or in the protein and five can be regarded as true neutral-neutral hydrogen bonds.  Just in case you though we were losing it, the eleventh hydrogen bond, that between GLN195 and carbonyl oxygen.  Although this looks like a neutral-neutral hydrogen bond it isn't really.  That carbonyl oxygen is one of the carboxylate oxygen atoms in the E.Tyr complex and it is no surprise to learn that this is the most important hydrogen bond for stabilising the transition state. &lt;br /&gt;&lt;br /&gt;Two of the five neutral-neutral hydrogen bond involve backbone atoms, leaving three that can be probed by conventional mutagnesis.  These involve the side chains of CYS35, THR51 and TYR34 and none appears to contribute more than 1.18kcal/mol.  THR51 and TYR34 both deploy hydroxyl groups and you've already heard our &lt;a href="http://gmc2007.blogspot.com/2008/01/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;concerns about interpreting contributions of hydroxyl groups&lt;/a&gt;.  Interestingly it is the thiol of CYS35that appears to make the largest contribution despite thiols being weaker hydrogen bonbd acceptors than hydroxyls.  &lt;br /&gt;&lt;br /&gt;So there you have it.  We have contributions of three neutral-neutral hydrogen bonds.  How likely do you think it is that one of these represents the upper limit for a neutral-neutral hydrogen bond? &lt;br /&gt;&lt;br /&gt;We have now reviewed the evidence presented by the defence in support the assertion that a neutral-neutral hydrogen bond will contribute no more than 1.5kcal/mol.  We hope that you have enjoyed the journey or at least found it to be a character building process.  In the next Crapshoot we will pass judgement.  Will it be 10 hours of community service or 10 minutes of Old Sparky?  &lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2008/03/neutral-neutral-hydrogen-bonds-verdict.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5975501108861308216?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5975501108861308216/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5975501108861308216' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5975501108861308216'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5975501108861308216'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/02/hydrogen-bond-what-did-it-do-for-them.html' title='A hydrogen bond: What did it do for them? Part 3'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_9pt-rDMtsM4/R7ImA2pNWDI/AAAAAAAAAA0/EaJ3GBotC-4/s72-c/image002.gif' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5426632845091894041</id><published>2008-02-10T14:33:00.000-08:00</published><updated>2011-01-08T19:19:12.326-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='molecular recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='hydrogen bonding'/><title type='text'>A hydrogen bond: What did it do for them? Part 3</title><content type='html'>We turn now to the &lt;a href="http://dx.doi.org/10.1038/314235a0" target="_window"&gt;remaining article &lt;/a&gt;cited in support of the &lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-will-it-do-for-me.html" target="_window"&gt;assertion that a neutral-neutral hydrogen bond will contribute no more that 1.5kcal/mol to binding affinity&lt;/a&gt;. We have not found the evidence (see parts &lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;1&lt;/a&gt; and &lt;a href="http://gmc2007.blogspot.com/2008/01/hydrogen-bond-what-did-it-do-for-them.html" target="_window" &gt;2&lt;/a&gt;) presented so far in support of this claim to be overly convincing so we're desperately hoping that this article will clear the rather muddy waters. The article was in fact the first one to be cited and pre-dates the other two. For those of our loyal readers who care about such trivia, the journal has a high impact factor. Why have we left it until last to review this article? We have our reasons which may become apparent to some of you.&lt;br /&gt;&lt;br /&gt;Before we get stuck into the business at hand, let's take a quick look at what we've learned so far. First, if you're going to claim that you've established an upper limit for the contribution of a hydrogen bond you do need to demonstrate that your hydrogen bonds are optimal in terms of geometry and solvent exposure.  We also learned that you can't really take the contributions of one type of acceptor (e.g amide oxygen or hydroxyl) and extrapolate them to other types of acceptor (e.g. aromatic nitrogen).  Lastly, the contribution of a hydrogen bond may well depend on the number of other intermolecular hydrogen bonds between ligand and protein.&lt;br /&gt;&lt;br /&gt;Delighted that you've taken all this in because it's time to take a look at the featured article.  This is a well known, heavily-cited publication that describes the use of protein engineering to analyse hydrogen bonding and biological specificity.  The enzyme is tyrosyl-tRNA synthetase and we should point out at the outset that it's a nice paper.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://bp0.blogger.com/_9pt-rDMtsM4/R7ImA2pNWDI/AAAAAAAAAA0/EaJ3GBotC-4/s1600-h/image002.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://bp0.blogger.com/_9pt-rDMtsM4/R7ImA2pNWDI/AAAAAAAAAA0/EaJ3GBotC-4/s400/image002.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5166233518657591346" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The starting point for the analysis is a a crystal structure (see fig 1 in the article) of tyrosyl adenylate (see above) bound to the enzyme.  Amino acids whose side chains are observed to form hydrogen bonds with the ligand are systematically modified (site-directed mutagenesis) and the effects of these mutations are quantified by comparing kcat/Km values with that for the wild type enzyme. There are 11 hydrogen bonds between ligand and protein.  Five of these can be counted as having a charged partner either in the ligand or in the protein and five can be regarded as true neutral-neutral hydrogen bonds.  Just in case you though we were losing it, the eleventh hydrogen bond, that between GLN195 and carbonyl oxygen.  Although this looks like a neutral-neutral hydrogen bond it isn't really.  That carbonyl oxygen is one of the carboxylate oxygen atoms in the E.Tyr complex and it is no surprise to learn that this is the most important hydrogen bond for stabilising the transition state. &lt;br /&gt;&lt;br /&gt;Two of the five neutral-neutral hydrogen bond involve backbone atoms, leaving three that can be probed by conventional mutagnesis.  These involve the side chains of CYS35, THR51 and TYR34 and none appears to contribute more than 1.18kcal/mol.  THR51 and TYR34 both deploy hydroxyl groups and you've already heard our &lt;a href="http://gmc2007.blogspot.com/2008/01/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;concerns about interpreting contributions of hydroxyl groups&lt;/a&gt;.  Interstingly it is the thiol of CYS35that appears to make the largest contribution despite thiols being weaker hydrogen bonbd acceptors than hydroxyls.  &lt;br /&gt;&lt;br /&gt;So there you have it.  We have contributions of three neutral-neutral hydrogen bonds.  How likely do you think it is that one of these represents the upper limit for a neutral-neutral hydrogen bond? &lt;br /&gt;&lt;br /&gt;We have now reviewed the evidence presented by the defence in support the assertion that a neutral-neutral hydrogen bond will contribute no more than 1.5kcal/mol.  We hope that you have enjoyed the journey or at least found it to be a character building process.  In the next Crapshoot we will pass judgement.  Will it be 10 hours of community service or 10 minutesw of Old Sparky?  &lt;br /&gt;&lt;br /&gt;SMILES for indexing&lt;br /&gt;&lt;span class="chem:smiles"&gt;[NH3+][C@@H](Cc4ccccc4)C(=O)OP(=O)([O-])OC[C@@H]1[C@@H](O)[C@@H](O)[C@@H](O1)n2cnc3c2ncnc3&lt;span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5426632845091894041?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5426632845091894041/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5426632845091894041' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5426632845091894041'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5426632845091894041'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/02/hydrogen-bond-what-did-it-do-for-them_10.html' title='A hydrogen bond: What did it do for them? Part 3'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_9pt-rDMtsM4/R7ImA2pNWDI/AAAAAAAAAA0/EaJ3GBotC-4/s72-c/image002.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-3835870669873521080</id><published>2008-01-30T11:30:00.000-08:00</published><updated>2010-09-26T15:07:40.588-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='molecular recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='hydrogen bonding'/><title type='text'>Molecules for simpletons</title><content type='html'>We now pause briefly in our survey ( &lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-will-it-do-for-me.html" target="_window"&gt;intro&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;part 1&lt;/a&gt; | &lt;a href="http://gmc2007.blogspot.com/2008/01/hydrogen-bond-what-did-it-do-for-them.html" target="_window"&gt;part 2&lt;/a&gt;) of the contributions of neutral-neutral hydrogen bonds to binding.  We will take a look at the concept of molecular complexity.&lt;br /&gt;&lt;br /&gt;The concept of molecular complexity was &lt;a href="http://dx.doi.org/10.1021/ci000403i" target="_window"&gt;articulated in a 2001 article &lt;/a&gt;which is oen of favorites.  The basic idea is simple.  Complex molecules are expected to bind more tightly to their targets than less complex molecules provided that they achieve an optimal fit. The sting in the tail is that the probability of achieving that optimal fit decreases with molecular complexity.  &lt;br /&gt;&lt;br /&gt;Let's put it all together. Take a look the plot below which corresponds to Figure 3 in the featured article and please accept our apologies for the poor quality graphic.  The mauve line is the probability of measuring binding, assuming that the compound does indeed bind and the blue line is the probability of matching one way.  The yellow line is the product of these two probabilities and represents the probability of what the authors term a useful event.  We are not convinced that binding more than one way is not useful.  However this is a minor detail and it has no qualitative effect on the the shape of the all important yellow curve.  It has a maximum. &lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/_9pt-rDMtsM4/R6eNRe4aAKI/AAAAAAAAAAs/MqxPhhFfKMw/s1600-h/image002.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_9pt-rDMtsM4/R6eNRe4aAKI/AAAAAAAAAAs/MqxPhhFfKMw/s400/image002.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5163250829290504354" /&gt;&lt;/a&gt;&lt;br /&gt;Molecular complexity ====&gt;&lt;br /&gt;&lt;br /&gt;What does this picture tell us?  If you're randomly testing compounds, you need to get the complexity just right.  If the compounds have too few molecular recognition elements, they can't bind tightly enough to be detected in the assay.  Unless of course you've got a special assay (e.g. protein-detect NMR) which is the essence of fragment screening.  Go to the other extreme, however, and you'll find that a molecule with lots of molecular recognition elements is unlikely to be able to deploy them all simultaneously.  So you screen at low complexity and and then make your leads more complex, increasing their potency against their intended target, all the while decreasing their chances of binding to the anti-targets.  Isn't Drug Discovery easy?  There are still unanswered questions.  What is low complexity?  Simple, it depends on your assay.  But how can that be, Master? Surely complexity is a purely molecular property.  That is true, Grasshopper, complexity is indeed molecular but its degree is assay-dependent.&lt;br /&gt;&lt;br /&gt;And you haven't even told us what molecular complexity is!  What does this load of bullshit have to do with the contribution of hydrogen bonds to binding energies?  Patience, Dear Reader, all will be revealed.  Molecular complexity can take many forms depending on whether you're trying to synthesise the molecule or intepret its NMR spectrum.  Bigger molecules tend to be more complex because there are limits to the number of chiral centers and spiro ring fusions you can accommodate in a molecule with a molecular weight of 42.  In molecular recognition, hydrogen bonding groups are important elements of molecular complexity.  This is so because their interactions with target and aqueous solvent are highly directional.  The solvent is more adaptable than the geometrically constrained target, but not infinitely so.  If you're going to yank all of those hydrogen bonding groups out of water, you'd better have some binding partners lined up in the protein.  In exactly the right places.&lt;br /&gt;&lt;br /&gt;By now you're thinking this all sounds very &lt;a href="http://en.wikipedia.org/wiki/Second_Law_of_Thermodynamics"&gt;Second Law&lt;/a&gt;, very &lt;a href="http://en.wikipedia.org/wiki/Maxwell%27s_demon"&gt;Maxwell's demon&lt;/a&gt;.  And that's exactly what you should be thinking so congratulations getting there. It is a privilege to write for such clever readers.  Molecular complexity is about entropy, even before you start to think about comformational flexibility.  And that bring us back to the &lt;a href="http://dx.doi.org/10.1021/bi00368a028" target="_window"&gt;binding of glucose analogs to glycogen phosphorylase&lt;/a&gt;.  Glucose has 5 hydroxyl groups which interact with the protein and most of these form hydrogen bonds to more than one residue.  So the probability that evry single one of these hydrogen bonds will be of optimum geometry is low.  &lt;br /&gt;&lt;br /&gt;Now let's take a look at what happens when you get rid of one of the hydroxyl groups.  The hydrogen bonds for the 4 remaining hydroxyl groups are no longer compromised by the geometric requirements of the hydroxyl group that you just zapped.  We expect, on the basis of molecular complexity, that removal of one hydroxyl group will strengthen the individual contributions of the others.  If you removed hydroxyl groups one at a time, their apparent contributions to binding energy would depend on the order in which they were removed and, by implication how many had already been removed.&lt;br /&gt;&lt;br /&gt;This brings us to the conclusion of our discussion.  Do you believe that the effect on binding, of removing of one of glucose's 5 hydroxyl groups will accurately predict an upper limit for the contribution of a neutral-neutral hydrogen bond?  It is not for us to say for we are but simple folk.  So just take this pebble from my hand and it will be time for you to go.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2008/02/hydrogen-bond-what-did-it-do-for-them.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-3835870669873521080?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/3835870669873521080/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=3835870669873521080' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3835870669873521080'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3835870669873521080'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/01/molecules-for-simpletons.html' title='Molecules for simpletons'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_9pt-rDMtsM4/R6eNRe4aAKI/AAAAAAAAAAs/MqxPhhFfKMw/s72-c/image002.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-6547527375398463547</id><published>2008-01-13T14:25:00.001-08:00</published><updated>2010-09-26T15:07:40.599-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='molecular recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='hydrogen bonding'/><title type='text'>A hydrogen bond: What did it do for them? Part 2</title><content type='html'>In our quest to discover what a hydrogen bond is really worth, we take a look at &lt;a href="http://dx.doi.org/10.1021/bi00368a028" target="_window"&gt;inhibition of glycogen phosphorylase &lt;/a&gt;by &lt;a href="http://en.wikipedia.org/wiki/Glucose" target="_window"&gt;D-glucose.&lt;/a&gt;  This article has been cited in support of the &lt;a href="http://www3.interscience.wiley.com/cgi-bin/abstract/55000580/ABSTRACT" target="_window"&gt; assertion that a neutral-neutral hydrogen bond contributes no more than a factor of 15-fold to binding affinity.&lt;/a&gt; &lt;br /&gt;&lt;br /&gt;Let's take a look at D-glucose.  It exists as a mixture of anomers and the authors of the featured article suggest that the alpha form binds to glycogen phosphorylase about 3-fold more strongly than the more abundant beta form.  You'll also notice some hydroxyl groups.  Quite a few hydroxyl groups in fact and we'll try to explore the significance of this observation in the next post. &lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/_9pt-rDMtsM4/R5OkeilFXoI/AAAAAAAAAAc/6MhY26wQwDs/s1600-h/glucose.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_9pt-rDMtsM4/R5OkeilFXoI/AAAAAAAAAAc/6MhY26wQwDs/s320/glucose.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5157646842854727298" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The Ki values for the two anomeric forms of glucose provide the reference and all we need to do now is measure Ki values for close analogs of the reference and we'll know what the hydrogen bonds contribute to binding.  The authors of the featured study removed hydroxyl groups and replaced them with fluorine atoms.  Fluorine is thought to be a slightly weaker hydrogen bond acceptor than hydroxyl oxygen but cannot function as a donor.  The authors suggest using the 1mM Ki measured for the alpha anomer to make comparisions with and this appears to be a sensible suggestion.  &lt;br /&gt;&lt;br /&gt;Let's see what happens for the hydroxyl at C1.  Removal of the hydroxyl (1-deoxy-D-glucose) results in a Ki of 11mM while replacement of hydroxyl with fluorine is slightly favorable (Ki = 0.6mM) for binding.  The crystal structure of the complex of glycogen phosphorylase with glucose suggested that the 1-hydroxyl did not function as a hydrogen bond donor. The C2 hydroxyl shows a broadly similar profile although the Ki for the deoxy analog is 27mM and the fluoro analog is not anomerically pure.  Removal of the hydroxyl groups at C3, C4 or C6 all lead to Ki values that are quoted as &gt;&gt; 100mM and the fluoro analogs all show weaker (25-200) binding than their parents.&lt;br /&gt;&lt;br /&gt;So what are we to make of all this?  According to Table II in the featured article, the hydroxyl at C1 accepts a hydrogen bond from Leu136 so the 11-fold decrease in affinity resulting from deletion of this hydroxyl can be linked to that hydrogen bond.  The real question however is whether this hydrogen bond represents the upper limit of what a neutral-neutral hydrogen bond can contribute. &lt;br /&gt;&lt;br /&gt;The oxygen of the C1 hydroxyl is linked to the ring oxygen by a single carbon and to the C2 hydroxyl by two carbons.  These structural features weaken the C1 hydroxyl as a hydrogen bond acceptor.  But there is another problem.  In bulk water both the oxygen and hydrogen atoms of the C1 hydroxyl form hydrogen bonds with water and in doing so strengthen each other's interaction an a synergistic (or cooperative) manner.  So the next question is whether accepting a hydrogen bond from Leu136 compromises the ability of the C1 hydroxyl to donate hydrogen bonds to water.  If so, this represents a thermodynamic penality that must be paid (like income tax) in order for the hydrogen bond to form.  You have to be absolutely certain that this is not happening if you are going top present this as an upper limit for the contribution of a neutral-neutral hydrogen bond to binding affinity.&lt;br /&gt;&lt;br /&gt;The interactions of the other hydroxyls with the protein are more complicated.  Each forms more than one hydrogen bond with the protein and the deoxy analogs are not anomerically pure.  The effect of deleting these hydroxyl groups is clearly catastrophic but we need to know how catastrophic if we are to set establish upper limits for the contributions of hydrogen bonds.  Also two (C3 &amp; C6) of the hydroxyl groups appear to be interacting with charged amino acid side chains. &lt;br /&gt;&lt;br /&gt;We like this paper and recommend it to anybody with an interest in molecular recognition of carbohydrates.  However the real question here is whether the contributions to binding of the hydroxyl groups at C1, C2 and C4 support the &lt;a href="http://www3.interscience.wiley.com/cgi-bin/abstract/55000580/ABSTRACT" target="_window"&gt; statement that a neutral-neutral hydrogen bond contributes no more than 15-fold to binding affinity.&lt;/a&gt; We are simple folk and leave that to you, the reader, to decide.&lt;br /&gt;&lt;br /&gt;But there is another issue which we've not yet touched on and you'll need to wait until the next post to find out about Molecules for Simpletons.  We hope your day has been enriched by this Crapshoot and that you'll drop by again real soon. &lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2008/01/molecules-for-simpletons.html"&gt;next&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;SMILES for indexing         &lt;br /&gt;&lt;span class="smiles"&gt;O1[C@@H](O)[C@H](O)[C@@H](O)[C@H](O)[C@H]1CO&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;O1[C@H](O)[C@H](O)[C@@H](O)[C@H](O)[C@H]1CO&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;O1[C@@H][C@H](O)[C@@H](O)[C@H](O)[C@H]1CO&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;O1[C@@H](F)[C@H](O)[C@@H](O)[C@H](O)[C@H]1CO&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;O1[C@H](F)[C@H](O)[C@@H](O)[C@H](O)[C@H]1CO&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;O1[C@@H](O)C[C@@H](O)[C@H](O)[C@H]1CO&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;O1[C@@H](O)[C@H](O)C[C@H](O)[C@H]1CO&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;O1[C@@H](O)[C@H](O)[C@@H](O)[C@H][C@H]1CO&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-6547527375398463547?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/6547527375398463547/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=6547527375398463547' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6547527375398463547'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6547527375398463547'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2008/01/hydrogen-bond-what-did-it-do-for-them.html' title='A hydrogen bond: What did it do for them? Part 2'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_9pt-rDMtsM4/R5OkeilFXoI/AAAAAAAAAAc/6MhY26wQwDs/s72-c/glucose.gif' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-518167429916263699</id><published>2007-12-30T17:44:00.000-08:00</published><updated>2010-09-26T15:07:40.650-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='molecular recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='hydrogen bonding'/><title type='text'>A hydrogen bond:  What did it do for them? Part 1</title><content type='html'>Drug discovery is both blessed and cursed with a wealth of folklore, rules and generalisations. While these provide comfort for the timid, it can be instructive to take a closer look at the basis for some of this folklore. The focus of &lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-will-it-do-for-me.html" target="_window"&gt;the previous Crapshoot&lt;/a&gt; was the assertion that a neutral-neutral hydrogen bond will contribute no more than 15-fold or 1.5kcal/mol to binding. This Crapshoot will examine some of the evidence.        &lt;br /&gt;&lt;br /&gt;We'll take a look at a &lt;a href="http://www.pnas.org/cgi/content/abstract/90/4/1172" target="_window"&gt;study of binding of small N-acetylated peptides to the antibiotic ristocetin A&lt;/a&gt;.  This paper has been cited as supporting the &lt;a href="http://www3.interscience.wiley.com/cgi-bin/abstract/55000580/ABSTRACT" target="_window"&gt;assertion that a neutral-neutral hydrogen bond will contribute no more than 1.5kcal/mol to binding affinity.&lt;/a&gt;  The peptides in this study can be written as:&lt;br /&gt;&lt;br /&gt;Ac-X-X and Ac-X&lt;br /&gt;&lt;br /&gt;where X can be either Glycine or D-Alanine.  The peptides use their NHs as donors and C-terminal carboxylates as anionic acceptors to interact with ristocetin A.   Contributions of hydrogen bonds to binding affinity are estimated by comparing binding free energies for a peptide and truncated analogs, for example:&lt;br /&gt;&lt;br /&gt;Ac-Gly-Gly/Ac-Gly ,  Ac-Gly/Acetate or Ac-D-Ala-D-Ala/Acetate&lt;br /&gt;&lt;br /&gt;So far so good. But just in case you thought that you get the contribution of the hydrogen bond by just subtracting a couple of free energies, think again because you first need to account for hydrophobic interactions and the entropic cost of freezing rotatable bonds.  If you overweight the importance of the hydrophobic interactions, you'll underweight the contribution of the hydrogen bonds because binding is a &lt;a href="http://en.wikipedia.org/wiki/Zero_sum" target="_window"&gt;zero sum&lt;/a&gt; game from the perspective of contributions.  Alert readers will recall our comments on reference states in &lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-will-it-do-for-me.html" target="_window"&gt;the previous Crapshoot&lt;/a&gt;.     &lt;br /&gt;&lt;br /&gt;Anyway if you work your way through the results presented in the paper, you'll find (see Table 1) the contributions of amide-amide hydrogen bonds to binding range from 1.0kJ/mol to 12.5kJ/mol.  The authors appear worried about the higher value which is a consequence of Ac-D-Ala-Gly binding 11kJ/mol more weakly than Ac-Gly-D-Ala despite a more favorable hydrophobic contribution.  What could be causing this difference?  Perhaps the methyl group of Ac-Gly-D-Ala finds a hydrophobic concavity in the binding site or somehow compromises the solvation of the carboxylate. Does Ac-D-Ala-Gly bind in a higher energy conformation than Ac-Gly-D-Ala?     &lt;br /&gt;&lt;br /&gt;We hope you're still with us.  Now let's go back to the &lt;a href="http://www3.interscience.wiley.com/cgi-bin/abstract/55000580/ABSTRACT" target="_window"&gt;review featured in the previous Crapshoot&lt;/a&gt; which asserts that a neutral-neutral hydrogen bond will contribute no more than 15-fold or a maximum of 1.5kcal/mol to binding affinity.  Now if you convert 1.5kcal/mol (actually equivalent to about 12-fold at 300K) to kJ/mol (1cal = 4.184J) you get a figure of 6.3kJ/mol. This is considerably less than 12.5kJ/mol and still falls short of the figure of 7.7kJ/mol that is derived from the difference in binding free energies of Ac-Gly-D-Ala and acetate. &lt;br /&gt;&lt;br /&gt;Now if you're going to use measurements like these to set upper limits for contributions of neutral-neutral hydrogen bonds to binding, there are some questions that you'll need to address.  Are the hydrogen bonds of optimal geometry?  Are the bound conformations strained?  How exposed are the binding partners to solvent?  Does hydrogen bond formation compromise the solvation of polar atoms that do not participate in the hydrogen bond?  How representative are amide-amide hydrogen bonds of all neutral-neutral hydrogen bonds?&lt;br /&gt;&lt;br /&gt;So there you have it: 9 values for contributions of amide-amide hydrogen bonds derived from thermodynamic data.  Do they support the assertion that a neutral-neutral hydrogen bond will contribute no more than 1.5kcal/mol to binding affinty?  We will leave it to you, the reader, to form your own opinion.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2008/01/hydrogen-bond-what-did-it-do-for-them.html"&gt;next&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;SMILES for Indexing&lt;br /&gt;&lt;span class="smiles"&gt;[O-]C(=O)C&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;CC(=O)N[C@H](C)C(=O)[O-]&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;CC(=O)NCC(=O)NCC(=O)[O-]&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;CC(=O)NCC(=O)N[C@H](C)C(=O)[O-]&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;CC(=O)N[C@H](C)C(=O)NCC(=O)[O-]&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;CC(=O)N[C@H](C)C(=O)N[C@H](C)C(=O)[O-]&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-518167429916263699?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/518167429916263699/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=518167429916263699' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/518167429916263699'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/518167429916263699'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-did-it-do-for-them.html' title='A hydrogen bond:  What did it do for them? Part 1'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-7497082616839629999</id><published>2007-12-23T17:20:00.000-08:00</published><updated>2010-09-26T15:07:40.663-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='molecular recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='hydrogen bonding'/><title type='text'>A hydrogen bond: What will it do for me?</title><content type='html'>In drug discovery, there is a long history (for example, see &lt;a href="http://dx.doi.org/10.1021/jm00334a001" target="_window"&gt;1&lt;/a&gt; and &lt;a href="http://dx.doi.org/10.1021/jm00378a021" target="_window"&gt;2&lt;/a&gt;) of associating structural changes with quantitative changes in activity. We were interested to read that a &lt;a href="http://www3.interscience.wiley.com/cgi-bin/abstract/55000580/ABSTRACT" target="_window"&gt;hydrogen bond between a neutral donor and neutral acceptor will contribute no more than a factor of 15-fold to binding affinity&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;This is really interesting.  Those of you who read the &lt;a href="http://kinasepro.wordpress.com" target="_window"&gt;kinasepro&lt;/a&gt; column, will know that the IP situation in the kinase area is challenging because of the huge number of patents.  You'll also know that many kinase inhibitors interact with the hinge region through neutral-neutral hydrogen bonds.  Now that quoted 15-fold figure actually represents an upper limit. So there you have it, dump the hydrogen bond and think of all the brand new, bright and shiny chemical space you can have all to yourself if you just think outside the box.  And don't forget to check &lt;a href="http://www.coronene.com" target="_window"&gt;Carbon-Based Curiosities&lt;/a&gt; if bright and shiny is your sort of thing.&lt;br /&gt;&lt;br /&gt;Now before you bet your project (and maybe even your company) on the the magic 15-fold upper limit, should we perhaps take a closer look at where this figure comes from?  How much data is it based on and what are the relevant reference states?  We will attempt to answer these questions in a short series of Crapshoots.  Until then, all that needs to be said is Happy Christmas from The Crapshoot, The Blue Team and the Red Team.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-did-it-do-for-them.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-7497082616839629999?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/7497082616839629999/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=7497082616839629999' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7497082616839629999'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7497082616839629999'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/12/hydrogen-bond-what-will-it-do-for-me.html' title='A hydrogen bond: What will it do for me?'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5310938959865660607</id><published>2007-12-15T17:34:00.000-08:00</published><updated>2010-09-26T15:07:40.667-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oral drugs'/><title type='text'>The Crapshoot so far</title><content type='html'>The first phase of The Crapshoot is now complete and we will move on to other topics.  To date we have looked at the properties of oral drugs and we'll take a quck look back at the journey so far. &lt;br /&gt;&lt;br /&gt;Size and lipophilicity appear to be accepted as the most important determinants of a molecule's fate in its quest to become a Marketed Orally Active Drug. Hydrogen bonding, quantified either as a count of donors and acceptors or by the curious polar surface area,  frequently makes an appearance in these discussions. However the connection between these properties and in vivo exposure is not particularly strong even if the trends are highly signficant.  This appears to have prompted some &lt;a href="http://gmc2007.blogspot.com/2007/10/rotatable-bonds-2-sacred-cow-culled.html" target="_window"&gt;unorthodox approaches to data analyis&lt;/a&gt; which, being simple folk, we admit to be being greatly confused by. &lt;br /&gt;&lt;br /&gt;There are still many unanswered questions.  Why does ionization not appear to be important in these analyses? Is octanol really the most approprate solvent with which to model the membrane interior or hydrophobic pockets in proteins?  Why do hydrogen bond donors appear to be different to acceptors if it's all just desolvation?  Should we use logP or logD to quantify lipophilicity?&lt;br /&gt;&lt;br /&gt;The analyses of druglikeness reviewed in The Crapshoot to date have typically looked at &lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5.html" target="_window"&gt;property distributions in databases of drugs&lt;/a&gt;.  Drugs can also be compared with non-drugs and we will review some of the relevant literature at some point in the future.  At this point we simply suggest that our choices of non-drugs can influence our views of druglikeness. &lt;br /&gt;&lt;br /&gt;Readers of this column may be familiar with the famous dictum of &lt;a href="http://en.wikipedia.org/wiki/Paul_Ehrlich" target="_window"&gt; Ehrlich&lt;/a&gt;, "Corpora non agunt nisi fixata". Substances must bind for their effects to be observed and if you're into water memory and homeopathy we suggest that, for your education and edification, you read &lt;a href="http://waterinbiology.blogspot.com/2007/08/bad-memory.html "&gt;this post in Water in Biology&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Ehrlich's dictum could be seen as a first law of druglikeness and we are surprised that the requirement for binding is not explcitly stated more often in discussions on the subject.  It is actually quite easy to design molecules with good aqueous solubility but getting these to bind where you want them to is rather more of a problem.  If the natural ligand for your target protein is lipophilic, it is likely that a competitive ligand will also have to be lipophilic in order to compete.  However, the natural ligand is unlikely to be inconvenienced by having to get from GI tract to plasma to cell to intracellular compartment. Are we blinkered by druglikeness when we should be thinking about drugability?  &lt;br /&gt;&lt;br /&gt;The focus of the Crapshoot will shift to the interactions between drugs and their targets, starting with a look at hydrogen bonding. The Blue Team and Red Team will return to their respective winter training camps to prepare for further gripping contests in the spring.  We will return to the druglikeness theme at some stage, provided that we can bear ploughing through literature which, we are not ashamed to admit, does not set our pulses racing. For now, just see if you can spot any &lt;a href="http://gmc2007.blogspot.com/2007/10/rotatable-bonds-1-categorical-sins.html" target="_window"&gt;categorical sins &lt;/a&gt;in &lt;a href="http://dx.doi.org/10.1038/nrd2445" target="_window"&gt;this recent review&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5310938959865660607?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5310938959865660607/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5310938959865660607' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5310938959865660607'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5310938959865660607'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/12/crapshoot-so-far.html' title='The Crapshoot so far'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5249986795106101470</id><published>2007-11-25T11:19:00.000-08:00</published><updated>2010-09-26T15:07:40.669-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='gossip'/><category scheme='http://www.blogger.com/atom/ns#' term='red team'/><category scheme='http://www.blogger.com/atom/ns#' term='blue team'/><title type='text'>Misadventures in Reciprocal Space</title><content type='html'>Transporters are not on the Christmas card list of your average Pharma scientist.  They will pump your offerings out of the cells and compartments into which you would like them to go.  They are, however, essential for normal physiological function and serve an important protective role.    &lt;br /&gt;&lt;br /&gt;At the start of the featured exchange we find The Blue Team in a strong defensive position with five publications (&lt;a href="http://dx.doi.org/10.1126/science.293.5536.1793" target="_window"&gt;1&lt;/a&gt; | &lt;a href="http://dx.doi.org/10.1016/S0022-2836(03)00587-4" target="_window"&gt;2&lt;/a&gt; | &lt;a href="http://dx.doi.org/10.1073/pnas.0400137101" target="_window"&gt;3&lt;/a&gt; | &lt;a href="http://dx.doi.org/10.1126/science.1107733" target="_window"&gt;4&lt;/a&gt; | &lt;a href="http://dx.doi.org/10.1126/science.1119776" target="_window"&gt;5&lt;/a&gt;), spanning four years, on the MsbA and EmrE transporters.   The journals concerned have high impact factors and it would seem that all is quiet on the western front.&lt;br /&gt;&lt;br /&gt;Maybe not that quiet and one should remember that the &lt;a href="http://en.wikipedia.org/wiki/Maginot_Line" target="_window"&gt;Maginot Line&lt;/a&gt; was truly formidable when viewed from an anterior perspective. The Red Team launches a &lt;a href="http://dx.doi.org/10.1038/nature05155" target="_window"&gt;well-prepared flanking manoeuvre&lt;/a&gt;.  Their structure for Sav1866 looks rather different to that of The Blue Team’s MsbA . Sufficiently different to indicate a convergent evolution of the two proteins, provided of course that the differences are indeed real.  &lt;br /&gt;&lt;br /&gt;The front collapses in disarray and soon a &lt;a href="http://dx.doi.org/10.1126/science.314.5807.1875b" target="_window"&gt;white flag &lt;/a&gt;hangs forlornly at The Blue Team’s command post.  An in-house data reduction program has introduced a sign change and the reported structures have the wrong hand.  No doubt the shock troops of Open Source are saying, “We told you so!” but we just ask how the formidable &lt;a href="http://gmc2007.blogspot.com/search/label/rule%20of%202" target="_window"&gt;Lady Bracknell&lt;/a&gt; would have loaded the orthonormal basis vectors of misfortune and carelessness.&lt;br /&gt;&lt;br /&gt;Were there earlier signs of weakness in the defensive line?  The Blue Team’s MsbA structure appears to be incompatible with &lt;a href="http://dx.doi.org/10.1096/fj.03-0107fje" target="_window"&gt;disulfide cross-linking studies of P-gp&lt;/a&gt;.  Shouldn’t be a problem because a crystallographic structure is fact and cross-linking studies are cross-linking studies.  However, The Red Team acknowledge that their Sav1866 structure is consistent with these earlier cross-linking studies.&lt;br /&gt;&lt;br /&gt;There are a number of lessons to be learned from this Cautionary Tale.  Firstly, in using protein crystallography, there is a need to be able to separate fact, interpretation of fact and fiction. Secondly, one should not let journal impact factor get in the way of one’s critical thinking.  And, lastly, The Blue Team sometimes comes second.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5249986795106101470?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5249986795106101470/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5249986795106101470' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5249986795106101470'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5249986795106101470'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/11/misadventures-in-reciprocal-space.html' title='Misadventures in Reciprocal Space'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-8043224436629917787</id><published>2007-11-10T12:43:00.000-08:00</published><updated>2012-01-27T13:11:53.698-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='az'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='gossip'/><category scheme='http://www.blogger.com/atom/ns#' term='red team'/><category scheme='http://www.blogger.com/atom/ns#' term='vertex'/><category scheme='http://www.blogger.com/atom/ns#' term='blue team'/><title type='text'>Cambridge one, Gothenburg nil</title><content type='html'>Economics has sometimes been called the dismal science.  Whoever coined that term would appear to be unfamiliar with the literature of virtual screening.&lt;br /&gt;&lt;br /&gt;It all started with the Blue Team’s &lt;a href="http://dx.doi.org/10.1002/prot.20088" target="_window"&gt;comparison of docking and scoring methods&lt;/a&gt;.   The &lt;a href="http://dx.doi.org/10.1021/ci0503255" target="_window"&gt;Red Team’s analysis&lt;/a&gt; came to different conclusions.  In particular the Red Team’s favourite docking program didn’t appear to have performed as well in the hands of the Blue Team.  The Red Team suggested that receptor preparation by the Blue Team may not have been optimal for the Red Team’s favourite docking program.&lt;br /&gt;&lt;br /&gt;The Red Team soon discover that baiting an opponent is best done from a position of strength as the &lt;a href="http://dx.doi.org/10.1021/ci600460h" target="_window"&gt;Blue Team’s response&lt;/a&gt; is swift and decisive.  They re-run their analysis using the latest version of the Red Team’s favourite docking program and, while they note an improvement on their initial results with the software, this still falls well short of what had been claimed by the Red Team.  They also get the developer of the Red Team’s favourite docking program to check things out and he is also unable to achieve the success of the Red Team.  As one of Macbeth’s witches may have said, ‘By the pricking of my thumbs, something aromatic this way comes’.&lt;br /&gt;&lt;br /&gt;The play is now deep in the Red Team’s half and their &lt;a href="http://dx.doi.org/10.1021/ci7003169" target="_window"&gt;response under pressure &lt;/a&gt;is somewhat underwhelming.  It turns out that the Red Team’s favourite software is now behaving more as it did in the hands of the Blue Team.&lt;br /&gt;&lt;br /&gt;Now take a real close look the Red Team’s response.  Note the order of the authors in the Red Team’s original paper and in the retraction of the results for their favourite docking program.  Careless?  Maybe not?&lt;br /&gt;&lt;br /&gt;Don’t you just love it when it gets bitchy!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-8043224436629917787?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/8043224436629917787/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=8043224436629917787' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8043224436629917787'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8043224436629917787'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/11/cambridge-one-gothenburg-nil.html' title='Cambridge one, Gothenburg nil'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-1996768387691689908</id><published>2007-11-01T12:43:00.000-07:00</published><updated>2010-09-26T15:07:40.682-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gossip'/><category scheme='http://www.blogger.com/atom/ns#' term='pharmacokinetics'/><title type='text'>Hepatic Extraction</title><content type='html'>It appears that some &lt;a href="http://www.schmutzie.com/2007/09/814-great-mofo-delurk-2007.html" target="_window"&gt;lament the declining number of comments &lt;/a&gt;on their blogs.  Comments are passé and so-2006 and we rarely bother.  Recently The Crapshoot received its first death threat.  Not any old death threat, but a slow and excruciatingly painful death threat involving liver extraction.  Manual liver extraction in fact and it was the death rather than the threat to which temporal reference was being made.  We feel so important since we never thought anyone actually cared that much.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-1996768387691689908?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/1996768387691689908/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=1996768387691689908' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1996768387691689908'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1996768387691689908'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/11/hepatic-extraction.html' title='Hepatic Extraction'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-1417028855288479181</id><published>2007-10-27T14:06:00.000-07:00</published><updated>2010-09-26T15:07:40.684-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oral drugs'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='molecular descriptors'/><title type='text'>Rotatable bonds 3: Savaged by a dead sheep?</title><content type='html'>Two years after the publication of the &lt;a href="http://dx.doi.org/10.1021/jm020017n" target="_window"&gt;article featured &lt;/a&gt;in the &lt;a href="http://gmc2007.blogspot.com/2007/10/rotatable-bonds-2-sacred-cow-culled.html" target="_window"&gt;previous Crapshoot&lt;/a&gt;, a &lt;a href="http://dx.doi.org/10.1021/jm0306529" target="_window"&gt;critique&lt;/a&gt; appeared in the same journal.   The critique started well, noting scepticism about being able to use simple molecular descriptors to predict complex properties.  Eagerly we read on, hoping to have some of the data analytic mysteries of the original article explained to us.  To cut a long story short, we were still hoping when we got to a section of the article entitled ‘References’.  &lt;br /&gt;&lt;br /&gt;The authors of the critique introduced their own data set of 434 compounds and used two methods to count rotatable bonds which, rather unfortunately, gave different results.  By checking numbers of rotatable bonds (NROT) for some of the non-proprietary compounds given in the original article they concluded that one of the methods gave NROT values that were essentially identical to those in the earlier publication.  But not identical and we don’t get to find out what ‘essentially’ means in quantitative terms.  We also find out that different methods of calculating polar surface area (PSA) can give different results but that particular sacred cow will have to wait until a future post for the captive bolt.&lt;br /&gt;&lt;br /&gt;We’re now ready to apply the filters described in the first paper.  In the original article, ~80% (&lt;a href="http://pubs.acs.org/isubscribe/journals/jmcmar/45/i12/figures/jm020017nf00004.html" target="_window"&gt;see Figure 4&lt;/a&gt;)  of compounds with NROT ≤ 10 and PSA ≤ 140Å**2 had bioavailability ≥ 20%.  The critique reported that only 70% of compounds satisfying these criteria had bioavailability exceeding 20%.  What are we to make of this?&lt;br /&gt;&lt;br /&gt;First just pretend that both groups have used identical methods to calculate NROT and PSA. There is still the issue of how different are 70% and ~80%.  When you’re looking at means, you can derive estimates for the uncertainties in what you measure and use these to see if two measurements are significantly different.  Neither group provides us with any measure for the uncertainty in the 70% or ~80% that they quote.  We are simple folk and easily confused (dare we say intimidated) by all this clever quantitative stuff and would greatly appreciate somebody explaining to us why these figures really are different.&lt;br /&gt;&lt;br /&gt;In the previous post, we posed a number of questions about the data analysis used.  We were surprised that the authors of the critique chose not to ask similar questions.  Does this mean that they believe the original analysis to be completely valid?  Differences in how PSA is calculated are fascinating (like &lt;a href="http://en.wikipedia.org/wiki/Train_spotting" target="_window"&gt;train-spotting&lt;/a&gt;) but we would have thought that some more probing questions might have been asked in a critique. &lt;br /&gt;&lt;br /&gt;So there you have it.  Two different groups have analysed two different bioavailability databases, using different methods to calculate descriptors and they have got different results.  And the results may not actually be that different.  Don’t worry, we’re just as confused as you!&lt;br /&gt;&lt;br /&gt;But what does all this have to do with dead sheep?  The parliamentarian &lt;a href="http://en.wikipedia.org/wiki/Denis_Healey" target="_window"&gt;Denis Healey &lt;/a&gt;once likened an attack by his opponent &lt;a href="http://en.wikipedia.org/wiki/Geoffrey_Howe" target="_window"&gt;Geoffrey Howe &lt;/a&gt;to being savaged by a dead sheep.  Dispatching his opponents with droll one-liners was a particular specialty of &lt;a href="http://en.wikipedia.org/wiki/Churchill" target="_window"&gt;Churchill&lt;/a&gt;.  He described &lt;a href="http://en.wikipedia.org/wiki/Attlee" target="_window"&gt;Attlee&lt;/a&gt; as a modest man with much to be modest about and the following exchanges with &lt;a href="http://en.wikipedia.org/wiki/Lady_Astor" target="_window"&gt;Lady Astor &lt;/a&gt;further illustrate this:&lt;br /&gt;&lt;br /&gt;Lady Astor: Sir, You are very drunk&lt;br /&gt;Churchill: Madam, you are very ugly but I will be sober in the morning&lt;br /&gt;&lt;br /&gt;Lady Astor: If you were my husband, I would poison your coffee&lt;br /&gt;Churchill: If you were my wife, I would drink it  &lt;br /&gt;&lt;br /&gt;This concludes our look at rotatable bonds and next we will provide some relief from the turgid literature reviews in the form of a cautionary tale or two.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-1417028855288479181?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/1417028855288479181/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=1417028855288479181' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1417028855288479181'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1417028855288479181'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/10/rotatable-bonds-3-savaged-by-dead-sheep.html' title='Rotatable bonds 3: Savaged by a dead sheep?'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-8566665433514898124</id><published>2007-10-14T13:05:00.000-07:00</published><updated>2010-09-26T15:07:40.691-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sacred cows'/><category scheme='http://www.blogger.com/atom/ns#' term='categorical sin'/><category scheme='http://www.blogger.com/atom/ns#' term='oral drugs'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='molecular descriptors'/><title type='text'>Rotatable Bonds 2: A Sacred Cow Culled?</title><content type='html'>We are now in a position to review the &lt;a href="http://dx.doi.org/10.1021/jm020017n" target="_window"&gt;first of our featured articles&lt;/a&gt;.  This heavily cited study of molecular properties that influence oral bioavailability claims that and polar surface area and the number of rotatable bonds (NROT) are both good predictors of bioavailability, independent of molecular weight (MW).  Now as we pointed out in the previous post, NROT in a molecule tends to correlate with molecular weight.  So we were extremely curious as to how bioavailability could be shown to be more dependent on the flexibility of molecules than their size.&lt;br /&gt;&lt;br /&gt;A good place to start is Table 2 which shows the correlation coefficients of bioavailability (%F) with MW and NROT of -0.35 and -0.39 respectively for the full data set of 1117 compounds.   The dataset is then sliced into 3 categories ( MW&lt;400, &gt;500, everything else) and correlation coefficients are also quoted for these groups.  It is no surprise that the correlation coefficient of %F with MW is lower for each of the groups.  However the authors note clear relationships (correlation coefficients of -0.40 and -0.34) between %F and NROT for the two highest molecular weight categories.&lt;br /&gt;&lt;br /&gt;It is easy enough to calculate correlation coefficients.  These statistics tend to be most meaningful when the relevant variables are normally distributed.  The data points furthest from the average have the greatest influence on this quantity so it is not surprising that lower correlation coefficients for %F and MW are observed for each of the 3 MW categories than for the entire dataset.  Now NROT is not perfectly correlated with MW and so its distribution doesn’t get chopped as drastically by the categorisation process and correlations with %F don’t drop as much.  We think an interesting control would have been to split into 3 groups by NROT and then look at correlations of %F with MW and NROT.  We expect that it would now be the correlations with NROT that weakened while the correlations with MW were less affected less affected by the categorisation of the data.  The essence of this analysis is that it is asymmetric with respect to how it treats these two potential descriptors of bioavailability.  So when the descriptors behave differently, does that reflect something meaningful or is it just a result of the asymmetric treatment of the descriptors?    &lt;br /&gt;&lt;br /&gt;So the correlation coefficients were not overly convincing so let’s take a look at the other stuff.  The data was also categorised by bioavailability into two groups of %F &lt; 20 and %F≥20.  Now just remember when you categorise like this, a bioavailability of 19% is treated the same as a bioavailability of 1% and different to a bioavailability of 21%.  Categorisation distorts relationships.  Anyway with that health warning, let’s just accept that the categorisation of %F is OK and take a look at Figure 1. &lt;br /&gt;&lt;br /&gt;Figure 1 claims to show that the effect of molecular rigidity is independent of molecular weight.  The data set is now split by MW into two (MW≥500, MW&lt;500) groups and by NROT into three groups (NROT≤7, 7&lt;NROT≤10, NROT&gt;10). The proportions of compounds in each of the NROT categories with %F of at least 20 are compared for the two MW categories.  The proportions appear in each case to be the same for the two MW categories and this is presented as evidence that the effect of molecular rigidity is independent of molecular weight.    &lt;br /&gt;&lt;br /&gt;We are just simple folk and really don’t know what to make of all this slicing and dicing of the data.  To be honest, we got a bit lost once we went from three MW categories to two.  Why couldn’t we just have a plot of %F against MW and another plot of %F against NROT instead of all the categorical gymnastics? Also that would treat the descriptors in a symmetric manner which one might argue is essential if you’re going to come to conclusions about which is the more important determinant of bioavailability.&lt;br /&gt;&lt;br /&gt;Now let’s go back and take another look at &lt;a href="http://pubs.acs.org/isubscribe/journals/jmcmar/45/i12/figures/jm020017nf00001.gif" target="_window" &gt;Figure 1&lt;/a&gt;.  The distributions for each of the NROT categories do indeed look very similar for the two MW categories but that doesn’t mean that there is there is no dependence of %F on MW.  It also doesn’t mean that the correlation between MW and NROT has miraculously disappeared either.  You just need to know where to look.  &lt;br /&gt;&lt;br /&gt;And where to look is the bottom of &lt;a href="http://pubs.acs.org/isubscribe/journals/jmcmar/45/i12/figures/jm020017nf00001.gif" target="_window"&gt;Figure 1&lt;/a&gt; where it says “n =”.  Using these figures you can work out the fraction of compounds in each NROT category with MW≥500.  We find that only 14% of the compounds in the NROT≤7 category have MW≥500 but this figure rises to 72% for the NROT&gt;10 category.&lt;br /&gt;&lt;br /&gt;A penetrating insight into the complex world of oral bioavailability or categorical sin?  Is the effect real or an illusion created by the asymmetric manner in which the descriptors have been treated?  It is not for us to say and we leave it to you the reader to decide.  This article did generate some commentary in the literature and in the next post we will take a closer look at some of that.&lt;br /&gt;&lt;br /&gt;If you got this far in a long and particularly turgid literature review, we salute your stamina while respectfully suggesting that you get a life.  Nevertheless, we hope that you’ll drop by again sometime soon.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-8566665433514898124?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/8566665433514898124/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=8566665433514898124' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8566665433514898124'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8566665433514898124'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/10/rotatable-bonds-2-sacred-cow-culled.html' title='Rotatable Bonds 2: A Sacred Cow Culled?'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-8682234451984229008</id><published>2007-10-10T14:21:00.000-07:00</published><updated>2010-09-26T15:07:05.976-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sacred cows'/><category scheme='http://www.blogger.com/atom/ns#' term='oral drugs'/><category scheme='http://www.blogger.com/atom/ns#' term='data analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='molecular descriptors'/><title type='text'>Rotatable Bonds 1: Categorical Sins</title><content type='html'>In this post we start to look at molecular flexibility as a determinant of oral bioavailability.  Too much flexibility, as any medicinal chemist knows, is A Bad Thing and rotatable bonds are most definitely of The Dark Side.  Well sort of A Bad Thing because too few rotatable bonds are likely to give you a &lt;a href="http://coronene.blogspot.com/"&gt;Carbon Based Curiosity&lt;/a&gt; rather than orally-dosed drug.  Now a molecule with lots of rotatable bonds needs to be quite large so as to be able to accommodate all those bonds and the Rule of 5 tells us that too much large is also A Bad Thing.  So your drug sucks.  Is it the size or the bonds?  &lt;br /&gt;&lt;br /&gt;Bioavailability is A Good Thing because without it you don’t have an oral drug.  It is normally given as a percentage and is a composite property that quantifies how well your drug is absorbed from the gut and how adept it is at evading the metabolic guardians of the body who mainly hang out in the liver.  So you want to figure out whether the rotatable bonds or the size of the molecule that controls bioavailability.  One thing you could do is plot bioavailability against number of rotatable bonds and molecular weight and see which descriptor best fits the measured data.  Alternatively you could transform the bioavailability because it is a fraction.  If the fit doesn’t look linear you might even try a bit of curve-fitting.  Hopefully you’ll agree that these are sensible ways to start.  However we need to digress a bit before we can introduce the first of the featured articles and beg your indulgence.&lt;br /&gt;&lt;br /&gt;If you work long enough in the pharmaceutical industry you’ll come across some particularly creative forms of data analysis.  One very common approach is to categorise continuous data.  For example we might classify activity as HIGH (IC50 &lt; 100nM), LOW (IC50 &gt; 1µM) and MEDIUM (everything else).  Categorising makes some sense when dealing with an assay with low dynamic range where a significant proportion of the measurements are above or below the limits for quantification but you still have to be careful.  However categorising continuous data like this has a dark side because it hides variation.  Is variation A Good Thing or A Bad Thing?  It depends on your perspective.  If you’re trying to flog your favourite molecular descriptor to a sceptical audience, variation is definitely A Bad Thing because it sows the seeds of doubt.  However if you’re looking for truth, you won’t know whether you’ve found it if you discard variation.  Our advice is to sniff for large rodents if presented with any analysis where continuous data has been treated in this manner.  Somebody is probably trying to hide something very aromatic and unpleasant. &lt;br /&gt;&lt;br /&gt;In the next post we will review a heavily-cited study of the influence of rotatable bonds on oral bioavailability.  Needless to say that analysis applies a significant amount of categorisation to the data.  We can barely contain ourselves.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-8682234451984229008?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/8682234451984229008/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=8682234451984229008' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8682234451984229008'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8682234451984229008'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/10/rotatable-bonds-1-categorical-sins.html' title='Rotatable Bonds 1: Categorical Sins'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-3408081234713014056</id><published>2007-10-01T13:54:00.000-07:00</published><updated>2010-09-26T15:07:06.003-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='opinion'/><category scheme='http://www.blogger.com/atom/ns#' term='organisational'/><category scheme='http://www.blogger.com/atom/ns#' term='pharma life'/><title type='text'>Pharma Life 1: The Leadership of Opinion</title><content type='html'>Big Pharma tends to be somewhat conservative and inward-looking, propelled, in accordance with &lt;a href="http://en.wikipedia.org/wiki/Newton%27s_first_law" target="_window"&gt;Newton’s first law&lt;/a&gt;, by the inertia of internal bureaucracy and process.  Periodically they wake up and take notice of the outside world.  For example, a senior manager might spot an article in Leadership Weekly that claims to show the direct relevance of semiconductor manufacturing processes to pharmaceutical research.  A new organisational model may be imported as a means to distract The Great Unwashed from more pressing concerns.   But if you really want to solve your problems, you need to find yourself an Opinion Leader.&lt;br /&gt;&lt;br /&gt;Readers of this column will recall that we once &lt;a href="http://gmc2007.blogspot.com/2007/06/rule-of-5-sociological-fallout.html" target="_window"&gt;likened opinions to haemorrhoids &lt;/a&gt;and you can read that piece if you want to find out why because we’re not going there again.  Opinion Leaders can be found in both in Pharma and in academia, although the industrial variety often only exists in his/her own imagination and those of their management.  Opinion Leaders in academia tend to have large, active research groups with lots of post-docs and graduate students and don’t normally find industry a particularly appealing prospect.  That leaves the other option which is to poach a competitor’s Opinion Leader.&lt;br /&gt;&lt;br /&gt;Generally the decision to recruit an Opinion Leader is a managerial rather than a scientific decision.  Generally an industrial Opinion Leader will be identified on the basis of speaking at conferences and publishing review articles as Current Opinions.  Our humble advice to managers seeking Opinion Leaders is to look at the 5 most recent publications of the potential recruit.  If 3 or more are review articles it is probable that the Opinion Leader’s shelf life has expired.  Do not assume that a Pharma scientist who publishes a lot and speaks at a lot of conferences is making huge contributions to drug discovery.  For reasons best known to the companies that employ them, some individuals appear to be able to devote a huge proportion of their time to external activities and may be better known outside their organisations then within.  It can be quite revealing to look at patents.  Finally, it is not unknown for managerial types to exploit reporting relationships to enhance their publication records.  &lt;br /&gt;&lt;br /&gt;Once the Opinion Leader is in there is little that can be done short of asking if your friendly patch-clamper can spare some &lt;a href="http://en.wikipedia.org/wiki/Tetrodotoxin" target="_window"&gt;tetrodotoxin&lt;/a&gt;.  Our Leaders have spent a lot of money recruiting this important person and don’t want the valuable opinions to be challenged by ungrateful colleagues.  Also don’t complain when the newly recruited Opinion Leader goes to five times as many conferences as everybody else.  Some lead and the rest follow.  It has always been that way so will you stop asking tiresome questions and get on with being led.  However look for the signs that all is not well.  Look for project managers being instructed to impose the Opinion Leader’s opinions because this shows how much their managers have lost their nerve.  Is the Opinion Leader too important to present their opinions internally? &lt;br /&gt;&lt;br /&gt;This is the first of our Pharma Life columns.  In the next post we will return to the turgid literature reviews with which readers of this column will be familiar.  We will attempt to cull one of the most sacred cattle in the pasture and have been waiting some time for this thrill.  Captive bolt will meet rotatable bonds and it could get messy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-3408081234713014056?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/3408081234713014056/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=3408081234713014056' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3408081234713014056'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3408081234713014056'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/10/big-pharma-tends-to-be-somewhat.html' title='Pharma Life 1: The Leadership of Opinion'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-7495388550073253206</id><published>2007-09-13T14:03:00.000-07:00</published><updated>2010-09-26T15:07:06.018-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='stamp collecting'/><category scheme='http://www.blogger.com/atom/ns#' term='privileged fragment'/><category scheme='http://www.blogger.com/atom/ns#' term='privileged substructure'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>Side chains:  Patterns in the shrubbery?</title><content type='html'>In the &lt;a href="http://gmc2007.blogspot.com/2007/08/frameworks-and-philately.html" target="_window"&gt;previous post&lt;/a&gt;, we reviewed a &lt;a href="http://dx.doi.org/10.1021/jm9602928" target="_window"&gt;study of molecular frameworks&lt;/a&gt; in drug molecules.  This work was &lt;a href="http://dx.doi.org/10.1021/jm9903996" target="_window"&gt;extended to side chains &lt;/a&gt;3 years later.  Side chains are defined by working in from the terminal (single connection to non-hydrogen atom) atoms at the periphery of the molecule until a ring or linker atom is encountered.  The atom to which the side chain is linked is also included in the definition so methyl on nitrogen is considered to be distinct from methyl on carbon.  The study of side chains was based on 5090 drugs but the earlier (by 3 years) article on frameworks referred to a data set of 5120 drugs.  Don’t worry if you’re confused by this because so are we.    &lt;br /&gt;&lt;br /&gt;The frequency of occurrence of pairs of side chains was also studied.  Readers of the previous Crapshoot will recall that piperidine was found more frequently (12) linked to benzene than as a framework in its own right (5).  We were puzzled that the authors did not study the pairwise occurrence of rings in the earlier work.  In this study, the authors simply report the pair distributions for the 25 most commonly found side chains.  Pair distributions are not particularly meaningful unless viewed in the context of the individual distributions for the two objects that define the pair.  A pair of side chains may be rarely found together in drugs because the side chains themselves are rare.  However they may be found together more frequently than you would expect by looking at the occurrences of the individual side chains.  &lt;a href="http://en.wikipedia.org/wiki/Contingency_table" target="_window"&gt;Contingency tables &lt;/a&gt;are but one of a number of ways to analyze this type of data and is a mystery to us why this has not been done.&lt;br /&gt;&lt;br /&gt;While the authors have been completely open about how compounds were retrieved from a commercial database, it is not clear how compounds had originally been selected for inclusion in that database.  According to Chart 1, the 14th most commonly found sidechain is nitro (137 occurrences).   How many of these compounds are marketed oral drugs?  How many of these nitro-containing compounds have actually been dosed in humans?  As the nitro group is so commonly found in drugs, should we be synthesizing more nitro compounds? &lt;br /&gt;&lt;br /&gt;Defining side chains in this manner raises the question of context.  The linking atom is included in the definition which we believe to be a necessary, but not sufficient, condition for specifying context.  The most commonly found sidechain was oxygen, doubly bonded to carbon, a definition that covers aldehydes, ketones, quinones, acid chlorides and anhydrides.  We suspect that many of the carbonyl groups in drugs are linked to nitrogen but the analysis leaves this to our imagination.  Do we believe that adding a carbonyl groups to our leads will improve their chances of becoming drugs?  How common are carbonyl groups in failed drugs?&lt;br /&gt;&lt;br /&gt;We don’t understand how it is possible to look at some features of drug molecules and decide whether molecules are drugs because of those features or in spite of them.  However we are simple folk and surely it is not for us to question what is written in the literature.&lt;br /&gt;&lt;br /&gt;We will take a break from privileged substructures for now although we expect to return to the theme at a later date.  We hope your day has been enriched by this review.  In the next post we will pause to reflect on life in pharma and after that it's back to the sacred cows.&lt;br /&gt;&lt;br /&gt;Moo!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-7495388550073253206?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/7495388550073253206/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=7495388550073253206' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7495388550073253206'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7495388550073253206'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/09/side-chains-patterns-in-shrubbery.html' title='Side chains:  Patterns in the shrubbery?'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-4916447376001071117</id><published>2007-08-30T13:49:00.000-07:00</published><updated>2011-01-09T04:26:53.886-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='stamp collecting'/><category scheme='http://www.blogger.com/atom/ns#' term='privileged fragment'/><category scheme='http://www.blogger.com/atom/ns#' term='privileged substructure'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>Frameworks and Philately</title><content type='html'>It was &lt;a href="http://en.wikipedia.org/wiki/Ernest_Rutherford" target="_window"&gt;Lord Rutherford &lt;/a&gt;who so neatly partitioned science into physics and stamp collecting.  We wonder whether reading medicinal chemistry journals in the early 21st century would have weakened or strengthened his opinions.  &lt;br /&gt;&lt;br /&gt;In this post, we examine a well-cited &lt;a href="http://dx.doi.org/10.1021/jm9602928" target="_window"&gt;study of molecular frameworks in drugs&lt;/a&gt;.  As defined the frameworks consist of rings joined by linkers and are generated by eliminating atoms in side chains.  Frameworks are defined at two levels depending on whether atom and bond types are encoded (atomic) or not (graph).  A total of 2505 atomic frameworks were found in a database of 5120 drugs and 1908 (76%) of these are unique (occur only once). A relatively small number (41) of atomic frameworks account for 1235 (24%) of the drugs in the database.  A similar analysis was performed for graph frameworks of which there are naturally less and it was found that 32 graph frameworks accounted for half the drugs in the database.  However you’ll probably need to convince yourself that differences between piperidine, morpholine, tetrahydropyran,  pyrimidine, pyrazine, cyclohexane and benzene are not important if you’re going to find the graph framework analysis useful.  Is thiophene more like tetrazole than benzene?  Are you feeling lucky?&lt;br /&gt;&lt;br /&gt;The primary output of the analysis is a set of frameworks and the frequencies with which they occur in the database.  Should you worry if the framework for your active series is only found once in the database?  Is it folly to substitute a carboxylate with a tetrazole if that turns a framework that occurs twice in the database into one that has never been seen before?  Is a molecule in the drug database because of its framework or in spite of it?  Would you prefer a &lt;a href="http://en.wikipedia.org/wiki/Phthalazine" target="_window"&gt;phthalazine&lt;/a&gt; to a &lt;a href="http://en.wikipedia.org/wiki/Penny_black" target="_window"&gt;Penny Black&lt;/a&gt;? The difficulty with using results from analyses like this is that we lack a reference point for the observed frequencies. &lt;br /&gt;&lt;br /&gt;Readers of this column will be aware that we have reviewed both the &lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5.html" target="_window"&gt;Rule of 5&lt;/a&gt;.  Ro5 is based on analysis of property distributions for a selection of 2245 drugs which have progressed to phase 2 efficacy studies with minimal reference to other compounds.  The basis of Ro5 is an appeal to physical chemistry for justification that the chosen properties are indeed relevant to intestinal absorption combined with an assumption that the upper tails of these distributions are unwise places to occupy.  We rather liked &lt;a href="http://dx.doi.org/10.1021/jm021053p"target="_window"&gt;another study &lt;/a&gt;that followed changes in properties of compounds as they progressed thru the development process.  That study identified 594 marketed oral drugs.  The more observant among you will have noted that 594 and 2245 are significantly smaller numbers than the 5120 drugs in the database for the framework analysis.  This raises the question of exactly what was included in the commercial database from which the 5120 drugs were selected.   &lt;br /&gt;&lt;br /&gt;We were confused by Chart 3.  This is claimed to show all six-membered rings found in the drug database.  First note the counts for benzene (433) and piperidine (5).  Now go back to Chart 2 which shows all atomic frameworks that occur at least 10 times in the drug database.  The count for benzene is still 433 so it would appear that Chart 3 actually refers to frameworks not rings.  In support of this view we also note that the 4-phenylpiperidine atom framework occurs 12 times, implying that that piperidine is found more commonly linked directly to benzene than as a framework in its own right (see below).   We will return to this point in the next post when we review the extension of this analysis to side chains.  However at this point we simply note that we just don’t see the point of Chart 3.  Perhaps our readers have better ideas and, if so, are encouraged to share them. Were he still alive, Lord Rutherford might have been able help out with a penetrating insight or two.  Or a least a set of &lt;a href="http://en.wikipedia.org/wiki/First_day_cover" target="_window"&gt;first day covers &lt;/a&gt;from his native New Zealand.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/_9pt-rDMtsM4/Rtcu4ZmBpvI/AAAAAAAAAAU/VwqFXHycL-o/s1600-h/image002.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_9pt-rDMtsM4/Rtcu4ZmBpvI/AAAAAAAAAAU/VwqFXHycL-o/s320/image002.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5104600249125676786" /&gt;&lt;/a&gt;&lt;br /&gt;The authors of the analysis suggest that a pharmacological promiscuity parameter could be derived for each framework by dividing the number of targets hit by drugs with the framework by that number of drugs.  We are unsure what such a parameter would tell us.  Suppose we’re looking at benzene as a framework.  In a typical drug, a single benzene ring makes a relatively small contribution to the overall size of the molecule and we would expect the drugs with this framework to be mutually very diverse.  The opposite situation will be observed for drugs with large, complex steroid frameworks.  Is pharmacological promiscuity defined in this manner a function of the framework or a reflection of variability of non-framework atoms?           &lt;br /&gt;&lt;br /&gt;This concludes the first part of our review of molecular frameworks.  In the next post we will look at side chains.  This will be most thrilling and we can barely contain our excitement.&lt;br /&gt;&lt;br /&gt;SMILES for Indexing&lt;br /&gt;&lt;span class="smiles"&gt;c1ccccc1&lt;/span&gt; &lt;br /&gt;&lt;span class="smiles"&gt;N1CCCCC1&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;c1ccccc1C2CCNCC2&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;c1nncc2c1cccc2&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-4916447376001071117?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/4916447376001071117/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=4916447376001071117' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4916447376001071117'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4916447376001071117'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/08/frameworks-and-philately.html' title='Frameworks and Philately'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_9pt-rDMtsM4/Rtcu4ZmBpvI/AAAAAAAAAAU/VwqFXHycL-o/s72-c/image002.gif' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-8010347552089683723</id><published>2007-08-24T14:53:00.000-07:00</published><updated>2010-09-26T15:07:06.022-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='travel'/><category scheme='http://www.blogger.com/atom/ns#' term='pictures'/><category scheme='http://www.blogger.com/atom/ns#' term='pharmacokinetics'/><title type='text'>The Crapshoot returns</title><content type='html'>The Crapshoot is now out of summer recess.  Before returning to the tedious literature reviews (we will leave it to our readers to decide whether it it the literature or the reviews that are tedious) with which those readers are familiar, we’ll link some photos in an attempt to relieve the Crapshoot's distinctly dreary aspect.&lt;br /&gt;&lt;br /&gt;Here's a sunset taken in the flat countryside around Siem Reap.  This is near the &lt;a href="http://en.wikipedia.org/wiki/Tonle_Sap"&gt;Tonle Sap &lt;/a&gt;(Cambodia’s Great Lake) which is a truly remarkable geographical feature.  The Tonle Sap is connected to the Mekong and flood waters flow from river to lake.  As Mekong water levels fall, the flow reverses and the lake empties into the river.  Just like pharmacokinetics!&lt;br /&gt;&lt;a href="http://lh6.google.com/gmcrapshoot/Rs8_Yv-A5QI/AAAAAAAAADU/bYMjb1MmSIE/IMG_0741.JPG"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px;" src="http://lh6.google.com/gmcrapshoot/Rs8_Yv-A5QI/AAAAAAAAADU/bYMjb1MmSIE/IMG_0741.JPG" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;People visit this part of Cambodia to see the ancient wonders of Angkor.  Here’s an Apsara from Angkor Wat.  She is a particularly exquisite example of her type and appears to be smiling (for the camera of course).&lt;br /&gt;&lt;a href="http://lh3.google.com/gmcrapshoot/Rs9Vy_-A5WI/AAAAAAAAAE0/-mVFd1EHrSk/IMG_0248.JPG"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px;" src="http://lh3.google.com/gmcrapshoot/Rs9Vy_-A5WI/AAAAAAAAAE0/-mVFd1EHrSk/IMG_0248.JPG" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;Angkor was founded by Hindus and the final picture (taken at Phnom Kulen) shows a lingam carved into the rock of a river bed.  &lt;br /&gt;&lt;a href="http://lh5.google.com/gmcrapshoot/Rs9Abf-A5SI/AAAAAAAAADo/BV8IhCwL4B8/IMG_0777.JPG"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px;" src="http://lh5.google.com/gmcrapshoot/Rs9Abf-A5SI/AAAAAAAAADo/BV8IhCwL4B8/IMG_0777.JPG" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;In the next post, it will be business as usual.  We will be taking a look at rings and frameworks in drug databases.   Some of our readers who are familiar with this area will have a good idea about which paper will be featured.  Strong coffee is recommended.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-8010347552089683723?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/8010347552089683723/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=8010347552089683723' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8010347552089683723'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8010347552089683723'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/08/crapshoot-returns.html' title='The Crapshoot returns'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-1035050964308002460</id><published>2007-07-23T12:50:00.000-07:00</published><updated>2010-09-26T15:07:06.028-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='enrichment'/><category scheme='http://www.blogger.com/atom/ns#' term='privileged fragment'/><category scheme='http://www.blogger.com/atom/ns#' term='privileged substructure'/><title type='text'>Privilege and promiscuity</title><content type='html'>Following on from &lt;a href="http://gmc2007.blogspot.com/search/label/rule%20of%205"&gt;our analysis of the Rule of 5&lt;/a&gt;, we will examine approaches to linking biological activity to specific substructural elements of molecular structure.  Our readers may have encountered the term &lt;a href="http://dx.doi.org/10.1021/jm00120a002"&gt;privileged structure&lt;/a&gt;.  These structures provide selective ligands for a range of receptors (typically GPCRs) and are clearly of interest if your target protein is in one their portfolios.  Note that privileged structures don’t simply nail every target presented to them with unfashionably steep dose-response curves.   That is &lt;a href="http://dx.doi.org/10.1021/jm010533y"&gt;promiscuity&lt;/a&gt; and the molecules that indulge in this sort of degenerate behaviour are anything but privileged.  However we encourage them to do it safely and not to share needles.&lt;br /&gt;&lt;br /&gt;In the next post we will look at analysis of the occurrence of substructures of drug databases.  These analyses need to address two issues.  First the substructures need to be detected and then their frequency of occurrence must be put into context.  In some ways we are looking at a substructural equivalent to the Rule of 5.  We will illustrate the problem with reference to the 2,4-dianilino pyrimidine (lets call it DAP) scaffold shown below.  Compounds based on this scaffold are often associated with activity against tyrosine kinases although achieving selectivity of TK inhibition with this structural type is typically challenging. &lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_9pt-rDMtsM4/RqUJSYrlBeI/AAAAAAAAAAM/ef5ZQud-2no/s1600-h/image002.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://2.bp.blogspot.com/_9pt-rDMtsM4/RqUJSYrlBeI/AAAAAAAAAAM/ef5ZQud-2no/s320/image002.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5090485165279741410" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Suppose our scaffold is found to be over-represented in a database of compounds with biological activity, TK inhibition for example.  The first question is how was the extent of over-representation (or enrichment as it is more fashionable to say) determined.  Let’s postpone that point until the next post when we’re getting stuck into some real literature.  But don’t let us forget because it was quite a good question!&lt;br /&gt;&lt;br /&gt;The next question is how was the scaffold found?  Where does scaffold end and shrubbery begin?  Once again, with all the evasiveness of a human resources professional, we’ll postpone responding to these tough questions until the next post.&lt;br /&gt;&lt;br /&gt;The observant amongst you will have noticed that the scaffold consists of three linked rings.  Could not each of these be considered as a scaffold?  This is a good observation and without giving too much of the next post away, we note that there is precedent for defining scaffolds in terms of rings.  Now here’s a little problem.  Suppose that our database of active compounds is found to be enriched (just trust us, it is) with DAPs.  It will also be enriched with benzenes, pyrimidines, 2-anilinopyrimidines, 4-anilinopyrimidines and aniline groups.  Less obviously, enrichment will also be observed for pairs of benzene rings.  One extreme scenario that might be observed is that enrichment of pyrimidine and benzene rings is entirely due to their inclusion in the DAP scaffold.  Put another way, the enrichment of the former two rings is context specific in this example.  &lt;br /&gt;&lt;br /&gt;Note that these substructural elements shouldn’t really be called privileged structures since that term was originally used to describe well defined scaffolds that delivered selective ligands for diverse targets.  Privileged substructure and privileged fragment are perhaps better terms and we will use both in indexing.&lt;br /&gt;     &lt;br /&gt;The Crapshoot is now in summer recess.  In about a month’s time we will review approaches to identifying privileged substructures in databases of biologically active compounds.  We wish all our readers a restful and enjoyable summer.   &lt;br /&gt;&lt;br /&gt;SMILES for Indexing&lt;br /&gt;&lt;span class="smiles"&gt;c1ccccc1Nc2nc(Nc3ccccc3)ccn2&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;c2ccccc2[NH]&lt;/span&gt;&lt;br /&gt;&lt;span class="smiles"&gt;n1cnccc1&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-1035050964308002460?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/1035050964308002460/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=1035050964308002460' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1035050964308002460'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1035050964308002460'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/07/privilege-and-promiscuity.html' title='Privilege and promiscuity'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_9pt-rDMtsM4/RqUJSYrlBeI/AAAAAAAAAAM/ef5ZQud-2no/s72-c/image002.gif' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5044861885602188070</id><published>2007-07-18T14:29:00.000-07:00</published><updated>2010-09-26T15:07:06.031-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='animal models'/><category scheme='http://www.blogger.com/atom/ns#' term='amusing or bizarre'/><title type='text'>The extended chemotype</title><content type='html'>&lt;a href="http://cdavies.wordpress.com/2007/06/24/spiders-web/"&gt;Lab Cat’s recent photo of a spider’s web &lt;/a&gt;reminded us of a &lt;a href="http://dx.doi.org/10.1016/j.physbeh.2004.04.058"&gt;most unusual animal model&lt;/a&gt;.  We had always thought of spiders as eight legs attached to a pair of fangs so it was a surprise to learn that they appear to have central nervous systems and that these are sensitive to pharmacological modulation.  We have to admit that the image of a spider, syringe in hand, trying to decide which leg to inject, is a little bizarre.&lt;br /&gt;&lt;br /&gt;The pharmaceutical industry generally prefers animal models to come with four (rather than eight) legs while fur is optional.  With apologies to Orwell, eight legs good, four legs better.  However, some &lt;a href="http://www.annieappleseedproject.org/limstudmodor.html"&gt;suggest trading a pair of legs for a pair of wings&lt;/a&gt; although we are unaware of how the latter would be accommodated in the Orwellian paradigm.&lt;br /&gt;&lt;br /&gt;Bats, it seems, have more in common with humans than do the rats and mice that are commonly used to model human disease.  Bats have menstrual cycles and, given the large number of ‘women in science’ blogs with authors considerably smarter and more articulate than GMC, we consider it unwise to anthropomorphize this as we have done for the smack shooting spiders.&lt;br /&gt;&lt;br /&gt;Some years ago we mentioned bats to a biologist at work.  The resulting look suggested that white-coated orderlies would be dispatched to administer strong medication.  Clearly the existing animal models are perfectly adequate and we do not anticipate anything choking in development for lack of efficacy.  This is reassuring.&lt;br /&gt;&lt;br /&gt;In the next post we will return to the turgid literature reviews with which readers of this column will be painfully aware.   Expect a thrill a minute as we take a look at privileged substructures.  It will be a real privilege.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5044861885602188070?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5044861885602188070/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5044861885602188070' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5044861885602188070'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5044861885602188070'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/07/extended-chemotype.html' title='The extended chemotype'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-4265009130525043741</id><published>2007-07-11T14:29:00.000-07:00</published><updated>2010-09-26T15:07:06.056-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='synthesis'/><title type='text'>Because it's there: Brief update</title><content type='html'>Following &lt;a href="http://gmc2007.blogspot.com/2007/07/because-its-there.html"&gt;our less than reverent look at total synthesis of natural products&lt;/a&gt;, we ferreted out &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=169"&gt;a post on the topic in peter mr's blog&lt;/a&gt;.  We should really have referenced this in our previous post and hopeful have made amends for our lack of rigour.&lt;br /&gt;&lt;br /&gt;As well as the utter pointlessness of much of the total synthesis of natural products, it should also be noted that one or two of the leading lights in the field have, on occasion, been observed to confuse themselves with minor deities.  While positive self image is a desirable personality trait, it is not universally appreciated by others when present to an excessive degree.  Does natural product synthesis have a more than equitable share of prima donnas?  We will let you, the reader, form your own views on this topic.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-4265009130525043741?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/4265009130525043741/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=4265009130525043741' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4265009130525043741'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/4265009130525043741'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/07/because-its-there-brief-update.html' title='Because it&apos;s there: Brief update'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-1542303901675694979</id><published>2007-07-02T14:28:00.000-07:00</published><updated>2010-09-26T15:07:06.065-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='opinion'/><category scheme='http://www.blogger.com/atom/ns#' term='synthesis'/><title type='text'>Because it's there</title><content type='html'>We read with interest &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?m=200706"&gt;peter mr’s comments &lt;/a&gt;on an &lt;a href="http://totallyretrosynthetic.blogspot.com/2007/05/i-would-like-to-mention-that-suggestion.html"&gt;interesting proposal &lt;/a&gt;by &lt;a href="http://totallyretrosynthetic.blogspot.com/"&gt;Totally Retrosynthetic&lt;/a&gt;.  The idea is to set up what is effectively a distributed synthetic project in a form (e.g. wiki) that enables collaboration by a large group of folk, each of whom can bring something special to the project. Peter mr describes the proposal as subversive but we think it is not nearly subversive enough.&lt;br /&gt;&lt;br /&gt;We believe this idea could be taken well beyond natural product synthesis and readers of this column will be aware that we &lt;a href="http://gmc2007.blogspot.com/2007/06/unnatural-products.html"&gt;recently questioned &lt;/a&gt;why people squander resources synthesising what nature can already do quite nicely.  We are keen to see more unnatural product synthesis which demands creative input into defining the synthetic target as well as the synthetic route.  Needless to say, distributed molecular design in an open environment may discomfort a few pharmaceutical companies which in the grand scheme of things would be A Good Thing.  However vendors of screening samples may take a rather more enlightened view of these subversive activities.  &lt;br /&gt;&lt;br /&gt;In his blog, peter mr likened synthetic chemistry to competitive sport.  Despite the usual cures for cancer, impotence, wrinkles and baldness that the synthesis of the target natural product will surely lead to, we all know the real reason why people synthesize natural products.  So they’re not prepared to die for their sport like the &lt;a href="http://en.wikipedia.org/wiki/George_Mallory"&gt;doomed and heroic Mallory&lt;/a&gt;, but hopefully you get the idea.  Why is that way with natural product synthesis?  Beacause, as &lt;a href="http://en.wikipedia.org/wiki/Henry_kissinger"&gt;Henry Kissinger &lt;/a&gt;might have put it, there’s so little at stake.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-1542303901675694979?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/1542303901675694979/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=1542303901675694979' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1542303901675694979'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1542303901675694979'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/07/because-its-there.html' title='Because it&apos;s there'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-7552571881273883764</id><published>2007-06-25T13:29:00.000-07:00</published><updated>2011-01-09T04:27:14.468-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='opinion'/><category scheme='http://www.blogger.com/atom/ns#' term='metric'/><category scheme='http://www.blogger.com/atom/ns#' term='rule of 5'/><category scheme='http://www.blogger.com/atom/ns#' term='organisational'/><title type='text'>The Rule of 5: Sociological fallout</title><content type='html'>In 1989 the Berlin wall came tumbling down and today the centrally planned economy can now only be found in The People’s Republics of North Korea and Cuba.  And, just in case you forget, a number of large pharmaceutical companies.&lt;br /&gt;&lt;br /&gt;In this post, the final in a series (see &lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5-warm-up.html"&gt;1&lt;/a&gt;, &lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5.html"&gt;2&lt;/a&gt;, &lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5-riding-wake.html"&gt;3&lt;/a&gt;, &lt;a href="http://gmc2007.blogspot.com/2007/06/rule-of-5-milking-sacred-cow.html"&gt;4&lt;/a&gt;) on &lt;a href="http://dx.doi.org/10.1016/S0169-409X(00)00129-0"&gt;The Rule of 5&lt;/a&gt;, we examine some of the sociological fallout of Ro5.  Our training is in the physical rather than social sciences so we apologise in advance to any sociologists, economists and organisational professionals who may be reading this. &lt;br /&gt;&lt;br /&gt;Many pharmaceutical companies today are distracted by a rather unhealthy focus on The Process of drug discovery.  New organisational models (OM) are continually introduced on the advice of people who are paid to introduce new OMs.   Cynically we wonder, if last year’s OM is now so bad that it needs to be replaced by this year’s OM, then why was it was introduced in the first place.  When the latest of a sequence of OMs is introduced, the recurring theme is that it will be different this time while previous OMs disappear without trace in manner that can only be described as Orwellian.  A key component of this organisational paradigm recalibration is The Metric.&lt;br /&gt;&lt;br /&gt;The most important feature of The Metric is that it be measurable.   A connection to something useful or relevant is quite nice to have but entirely secondary to the grail of measurability.  The emergence of The Metric as a basis function of the modern organisational wavefunction is an inevitable consequence of the chronic Physics Envy from which Management Science suffers. The term Physics Envy, which we first encountered in Gould's Mismeasure of Man, refers to a longing in, dare we say, softer disciplines for the quantitative rigor of physics.  The main symptom of Physics Envy is to see only the numbers of physics and not the underlying theoretical basis from which those tantalising numbers were distilled.  By now we can hear you all asking what this load of bullshit has to do with Ro5.   Please read on and all will be revealed.&lt;br /&gt;&lt;br /&gt;The link of course is that Ro5 is a metric, or more accurately a connected group of metrics.  Quantification is simply not an issue owing to the exemplary care with which Ro5’s creators have defined it and there is the additional bonus of a connection with oral bioavailability.  Ro5 is, with apologies for our coarseness, the research manager’s wet dream.  Although they would prefer to be called Leaders since Manager is just so y2k.  Another evolutionary advantage that contributes to Ro5’s fitness in the organisational environment is that much of it is defined in terms of nice comfortable integers.  Generally research managers find integers reassuring even when the precision that they convey is illusory.  In contrast, the floating point world is a rather chaotic and unfriendly place since any smart ass scientist can ask you about the error bars and other things that really should not discussed in polite company.  In some quarters Ro5 is a box to be ticked and has become an end in its own right rather than a means to an end.  Beware the Metric for it is a good servant but a poor master.  And some might say that Ro5 is the uber-metric.&lt;br /&gt;&lt;br /&gt;Sociology as we understand it involves the study of the different groups that make up societies and how they interact.  Those of you who have worked in Big Pharma will be aware that there are a number of people who work in Drug Discovery departments who have don’t have much to do with the search for new medicines.  One group that appears to be increasing in numbers is that of those who appear to be paid to primarily to have and express opinions.  An ever-filling silo of metrics provides ample fodder these self-styled opinion leaders allowing them to pronounce on the optimal combination of polar surface area and covariance-scaled hyperpolarizability to achieve industry leading mitochondrial penetration.   We prefer to characterise these folk as merely opinionated and have noticed that they appear discomforted by people outside their group expressing opinions of their own.  This is really about division of labor.  We have chemists to make compounds, biologists to assay compounds, opinion-havers to have opinions and managers to read bullet points on autocue and admire each other’s navels.  However there is one very basic problem with allowing individuals to specialise in having opinions.  With apologies in advance for uncouthness, we note that opinions are like hemorrhoids.  Any asshole can have them!&lt;br /&gt;&lt;br /&gt;Is it fair to blame the Metrication of Drug Discovery on Ro5?  Has this metrication led to our leaders becoming enslaved by their creations?  Are the authors of this column overly preoccupied with conspiracy theories?   We are simple folk and leave it to you, the reader to answer these questions for yourselves.  This now concludes our extended commentary on The Rule of 5 and we hope that you have enjoyed it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-7552571881273883764?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/7552571881273883764/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=7552571881273883764' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7552571881273883764'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/7552571881273883764'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/06/rule-of-5-sociological-fallout.html' title='The Rule of 5: Sociological fallout'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-8620301291761697069</id><published>2007-06-15T14:45:00.000-07:00</published><updated>2010-09-26T15:07:06.084-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='opinion'/><category scheme='http://www.blogger.com/atom/ns#' term='synthesis'/><title type='text'>Unnatural Products</title><content type='html'>Unnatural products have featured recently in blogs.  A &lt;a href="http://blogs.nature.com/thescepticalchymist/2007/06/a_knights_tale.html"&gt;well deserved knighthood &lt;/a&gt;was noted in &lt;a href="http://blogs.nature.com/thescepticalchymist/"&gt;The Sceptical Chymist &lt;/a&gt;and &lt;a href="http://scienceblogs.com/moleculeoftheday/2007/06/nanokid_yes_this_got_federal_f.php#more"&gt;NanoKid&lt;/a&gt; (a &lt;a href="http://dx.doi.org/10.1021/jo0349227"&gt;NanoPutian&lt;/a&gt;) starred as &lt;a href="http://scienceblogs.com/moleculeoftheday/"&gt;Molecule of the Day&lt;/a&gt;.  The comment on federal funding got us thinking, 'why not'.  After all, synthesis of unnatural products almost always involves some element of molecular design.  The creativity extends beyond the (frequently demanding) synthesis.&lt;br /&gt;&lt;br /&gt;We do not seek to trivialize the difficulties of total synthesis of something with 57 chiral centers that has been liberated from some hapless creature living at the bottom of the Marianas Trench.  However we wonder whether this effort could have not been put to more productive use making something that is a little less familiar to the residents of the Marianas Trench.  At the risk of being coarse, we observe that total synthesis of natural products can occasionally appear to be of a somewhat masturbatory aspect. &lt;br /&gt;&lt;br /&gt;Interestingly, many pharmaceutical companies seek to recruit people trained in natural product synthesis as medicinal chemists.  These people, whose entire research experience is synthesizing molecules that nature has chosen for them, are expected to switch to designing molecules.  Meanwhile there is a trend in the pharmaceutical industry towards outsourcing synthesis to lower cost locations.  This includes project compounds as well as general purpose screening library compounds.  Should Pharma be more interested in NanoPutians than homogenised Marianas Trenchians?  We are simple folk and it is not for us to say.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-8620301291761697069?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/8620301291761697069/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=8620301291761697069' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8620301291761697069'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/8620301291761697069'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/06/unnatural-products.html' title='Unnatural Products'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-128694231597217211</id><published>2007-06-05T14:15:00.000-07:00</published><updated>2012-01-27T13:14:06.190-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sacred cows'/><category scheme='http://www.blogger.com/atom/ns#' term='astex'/><category scheme='http://www.blogger.com/atom/ns#' term='rule of 5'/><category scheme='http://www.blogger.com/atom/ns#' term='rule of 3'/><category scheme='http://www.blogger.com/atom/ns#' term='fragment screening'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><category scheme='http://www.blogger.com/atom/ns#' term='ddt'/><title type='text'>The Rule of 5: Milking the sacred cow</title><content type='html'>Given the huge influence of&lt;a href="http://dx.doi.org/10.1016/S0169-409X(00)00129-0"&gt; Ro5&lt;/a&gt;, it was only natural that others would attempt to cash in on some of this influence. If Ro5 is indeed a Sacred Cow then perhaps it is not too unfair to suggest that some have sought to milk it. We illustrate this phenomenon with a discussion of The Rule of 3 which follows on from our &lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5-riding-wake.html"&gt;previous post on Ro5&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Ro3 was published in 2003 as a short &lt;a href="http://dx.doi.org/10.1016/S1359-6446(03)02831-9"&gt;item in a journal discussion forum&lt;/a&gt;. The rule applies to the selection of compounds for fragment screening (see &lt;a href="http://dx.doi.org/10.1021/jm040031v"&gt;review article &lt;/a&gt;and posts from &lt;a href="http://totallymedicinal.wordpress.com/tag/fragments/"&gt;StrictlyMedicinal&lt;/a&gt; and &lt;a href="http://sanjayat.wordpress.com/2006/11/05/tethering/"&gt;Whistling in the Wind&lt;/a&gt; ). We know little about this subject and have no experience whatsoever in the area. However we maintain the finest traditions of the pharmaceutical industry by not letting this inhibit us from having opinions on the subject.&lt;br /&gt;&lt;br /&gt;Ro3 effectively scales down Ro5 for fragment screening libraries. The rule suggests that hits from fragment screening have MW&lt;300Da, ClogP &lt;=3, HB-donors &lt;=3 and HB-acceptors &lt;=3. The rule's creators also suggest keeping rotatable bonds and polar surface area below 3 and 60Ang**3 respectively. As presented, Ro3 raises questions about how the hydrogen bonding groups are defined. You will recall that Ro5 defines all N and O as acceptors and all NH and OH as donors. This can be can be criticized (should tertiary amide nitrogen be classed as an acceptor?) but at least the donors and acceptors are specified with sufficient precision to allow all but the most innumerate to establish whether a molecule breaks the rule. If Ro3 is using Ro5 definitions (all donors are also acceptors) of hydrogen bonding, then the restriction of &lt;=3 donors is completely redundant because it is enforced by &lt;=3 acceptors. &lt;br /&gt;&lt;br /&gt;Now let’s take a look at &lt;a href="http://dx.doi.org/10.1016/S0968-0896(02)00239-0"&gt;tetrazole&lt;/a&gt;, the classic carboxylic acid isostere found in the Angiotensin II receptor antagonist &lt;a href="http://www.jbc.org/cgi/content/full/279/15/15248/FIG1"&gt;candesartan&lt;/a&gt;. There are 4 nitrogen atoms in the tetrazole ring (the name is a bit of a giveaway) and Ro5 would count 4 acceptors and 1 donor. So if Ro3 uses the Ro5 hydrogen bonding model, tetrazoles would miss the fragment screening fun and as would a number of acidic sulfonamides. Looking at tetrazole a bit differently you might say that the nitrogen with the hydrogen isn’t really an acceptor because its lone pair participates in the aromaticity of the ring while the lone pairs of the other nitrogens are mere spectators. However a tetrazole will ionize under physiological conditions to give an anion in which all 4 nitrogens can now function as hydrogen bond acceptors. Confused? Don’t worry, so are we!&lt;br /&gt;&lt;br /&gt;So we still haven’t decided whether Ro3 allows us to put 5-phenyltetrazole into the screening library that we’re building. It’s a must win project (aren't they all?) and all known ligands are anionic. Acidic sulfonamides may be similarly forbidden and we’re not finding Ro3 a whole lot of help right now. Our view is that if you’re going to publish a rule based on counting things you do need say exactly what those things are. Especially when people are going to cite your rule in the literature and &lt;a href="http://www.maybridge.com/portal/alias__Rainbow/lang__en/tabID__177/DesktopDefault.aspx"&gt;market screening libraries &lt;/a&gt;based on your rule. Are we being overly pedantic? We will let you, the reader, decide.&lt;br /&gt;&lt;br /&gt;Perhaps we can answer some of the questions by doing some literature searching. Most of Ro3’s creators were re-united in a &lt;a href="http://dx.doi.org/10.1021/jm050850v"&gt;2006 publication &lt;/a&gt;which does cite Ro3, suggesting that it still represents their views to some extent. Now go back and check the link again and take a really, really good look at it. Count the nitrogens in the fragment in the graphical abstract with the IC50 of 0.33mM. One, two, three, four! Everyone get four? Excellent! What an attentive class you've been! If Ro5 definitions of hydrogen bonding are used it would appear that Ro3's creators exploiting a fragment that violates their own rule. How very naughty that would be! As an aside, we hope that you’ve noticed that this tetrazole is different from the candesartan tetrazole because it is linked thru nitrogen and can’t ionize. We don’t think that this nitrogen will actually function as an acceptor although it will augment the acceptor ability of the other nitrogen atoms. But this is not the Ro5 model of hydrogen bonding. The view from The Grassy Knoll might be that the real purpose of Ro3 is to spread confusion and dissuade others from using fragments that its creators would prefer to reserve for their own use. However much we enjoy conspiracy theories, we do not subscribe to this extreme view that we believe to be overly paranoid.&lt;br /&gt;&lt;br /&gt;We now examine how Ro3 treats molecular weight. Before Ro3’s creators present their rule they note a MW range of 100-250Da for fragment libraries screened using high throughput X-ray crystallography. They then refer to some analysis of fragment hits which suggested that these obeyed a Rule of 3. This raises more questions than it answers. Did they actually screen anything with MW greater than 300Da or even the 250Da that they first mention? If so, did the larger fragments actually fail to hit or were they just too insoluble? Are the cutoffs absolute or do they, by analogy with Ro5, allow a defined fraction of acceptable fragments to lie above the rule's limit?&lt;br /&gt;&lt;br /&gt;Cynically we wonder how much of detail of the Ro3 has been imposed in attempt to milk Ro5. We read of &lt;a href="http://dx.doi.org/10.1021/jm000164q"&gt;NMR screening libraries having average MW of 200Da &lt;/a&gt;and wonder whether a 10% Ro5-like cutoff of 250Da might be more appropriate. Unfortunately the Rule of 2.5 doesn't quite have the same bite while doing some quite horrid things to hydrogen-bonding groups that would render electrons quite irrelevant. When your rule is based on 5, unit differences are less noticeable.  And of course the main problem with setting up integer-based rules for fragment selection is that &lt;a href="http://gmc2007.blogspot.com/2007/05/lady-bracknell-sacred-cattle-fast-food.html"&gt;The Rule of 2&lt;/a&gt; has already been taken by the formidable Lady Bracknell. So it really could never have been anything other than the rule of 3. Misfortune, carelessness or nascent dairy industry? We are simple folk and it is not for us to say.&lt;br /&gt;&lt;br /&gt;This concludes our technical review of Ro5 and in the next posting in the series we will examine some of its 'sociological' fallout. We hope that you have enjoyed the commentary so far.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2007/06/rule-of-5-sociological-fallout.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-128694231597217211?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/128694231597217211/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=128694231597217211' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/128694231597217211'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/128694231597217211'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/06/rule-of-5-milking-sacred-cow.html' title='The Rule of 5: Milking the sacred cow'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-6575853030979514020</id><published>2007-05-27T12:01:00.000-07:00</published><updated>2010-09-26T15:07:06.128-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='update'/><title type='text'>Fixed some links</title><content type='html'>Some links in earlier posts were not working properly.  We have fixed these and reorganised the labels to make material easier to find. We realise that it is a bit naughty to edit material that has already been posted and we apologise to our readers for doing this. We hope that you understand.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-6575853030979514020?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/6575853030979514020/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=6575853030979514020' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6575853030979514020'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6575853030979514020'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/05/fixed-some-links.html' title='Fixed some links'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-2101230840392762566</id><published>2007-05-27T04:28:00.000-07:00</published><updated>2011-01-09T04:27:46.370-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oral drugs'/><category scheme='http://www.blogger.com/atom/ns#' term='rule of 5'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>The Rule of 5: Riding the wake</title><content type='html'>As noted in a &lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5.html"&gt;previous post&lt;/a&gt;, &lt;a href="http://dx.doi.org/10.1016/S0169-409X(96)00423-1"&gt;Ro5&lt;/a&gt; is based on analysis of property distributions for a selected set of orally-dosed drugs and the analogous distributions for compounds that were not orally dosed drugs were not examined. Defining a relevant set of compounds that are not orally-dosed drugs requires some thought because there are number of reasons (e.g. lack of biological activity, better compounds in series, binding to anti-target, invalidation of target, portfolio changes, patent scoop, conservative leadership, internal politics...) why a compound might fail to gain membership of this exclusive club. In this post we feature some publications that follow in Ro5's wake.&lt;br /&gt;&lt;br /&gt;Six years after Ro5's entrance on the Pharma stage, 'A Comparison of Physicochemical Profiles of Development and Marketed Oral Drugs' appeared in the literature (&lt;a href="http://dx.doi.org/10.1021/jm021053p"&gt;&lt;em&gt;J. Med. Chem.&lt;/em&gt; &lt;strong&gt;2003&lt;/strong&gt;, &lt;em&gt;46&lt;/em&gt;, 1250-1256&lt;/a&gt;). This is an important publication because it fills gaps left by the Ro5 study, in particular the lack of comparison of orally-dosed drugs with a relevant set of other compounds. In this analysis, orally-dosed drugs are defined by 594 marketed oral drugs from the 1999 Physicians' Desk Reference and property distributions are compared with those for compounds at various stages of the development process (e.g. phase 1, discontinued phase 1...).&lt;br /&gt;&lt;br /&gt;A number of significant trends are observed, including a steady decrease in molecular weight progessing thru the development process, and fewer rotatable bonds in marketed oral drugs than in compounds earlier in development. On the subject of rotatable bond count, the authors do note that this quantity correlates with molecular weight and we will return to this theme in a future post. An interesting comparsion (see Table 3 in article) is made between the 90% cutoffs for marketed oral drugs and the USAN library used to derive Ro5. Typically each value below which 90% of the data set lies is lower for marketed oral drugs (MW: 473, H-bond donors: 4, H-bond acceptors: 7) with the exception of logP for which different prediction algorithms were used in the two studies. This paper provides original insight while succesfully building on what has come before; it is well worth reading.&lt;br /&gt;&lt;br /&gt;Significant trends are necessary but not sufficient to establish a cause and effect relationship. The early development compounds of today are the launched drugs of tomorrow. Could it be that drug molecules are just getting bigger as the low-hanging fruit have already been picked? The authors of 'Characteristic Physical Properties and Structural Fragments of Marketed Oral Drugs' (&lt;a href="http://dx.doi.org/10.1021/jm030267j"&gt;&lt;em&gt;J. Med Chem.&lt;/em&gt; &lt;strong&gt;2004&lt;/strong&gt;, &lt;em&gt;47&lt;/em&gt;, 224-232&lt;/a&gt;) assert that average molecular properties of drugs do not change significantly with respect to when the drugs were launched. However it is noted in 'Time-Related Differences in the Physical Property Profiles' (&lt;a href="http://dx.doi.org/10.1021/jm049717d"&gt;&lt;em&gt;J. Med. Chem.&lt;/em&gt; &lt;strong&gt;2004&lt;/strong&gt;, &lt;em&gt;47&lt;/em&gt;, 6338-6348&lt;/a&gt;) that oral drugs approved for each of the years from 1983 to 2002 have larger median molecular weights than those approved in 1982 and earlier years. Mean values of molecular weight, (O+N), hydrogen bond acceptors, rotatable bonds and rings were significantly higher for oral drugs launched in 1983 through 2002 than for pre-1983 drugs. Neither of these two papers suggests that the more recently launched drugs are more lipophilic.&lt;br /&gt;&lt;br /&gt;We discussed effects of correlations beween properties in the &lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5.html"&gt;previous Ro5 post&lt;/a&gt; and it is interesting to see how these are treated. Correlations between 12 properties are shown in Table 2 of &lt;a href="http://dx.doi.org/10.1021/jm030267j"&gt;JMC47:224&lt;/a&gt;, noting that these can be narrowed down to 8 representative properties on the basis of a correlation threshold of R=0.9. A rather unusual approach to examining correlations is used in &lt;a href="http://dx.doi.org/10.1021/jm049717d"&gt;&lt;em&gt;JMC&lt;/em&gt;&lt;em&gt;47:&lt;/em&gt;6338&lt;/a&gt; where the authors derive a regression equation for ClogP with molecular weight, (O+N) and (OH + NH) as the X-variables. We would have thought that principal component analysis would have been the tool of choice here and were left wondering why the chosen paths had been taken.&lt;br /&gt;&lt;br /&gt;We are simple folk and are left a bit confused by all of this. The pre-1983 drugs do appear to be smaller than those launched at a later date. Was this because of a steady increase with time before 1983 or an initial sharp rise to a plateau that may have been reached 40 years ago? No temporal connection is demonstrated between changes in properties of launched drugs and changes in those properties progessing thru development. Is 1982 the most appropriate point to make the cut, does it matter whether launched drugs are getting bigger and should we even care? We are also confused by the absence of a description of ionisation in these analyses (and indeed in the original Ro5 analysis) since this property is likely to be orthogonal to the (correlated) descriptors that are used. We suggest that ionisation may also occasonally have a modest effect on physical properties and hope that our readers (and the owners of the &lt;em&gt;auto da fe&lt;/em&gt;) do not find this view too sacrilegeous.&lt;br /&gt;&lt;br /&gt;In the next post on this topic we will take a look at Ro5's little sister. Her name is Ro3.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2007/06/rule-of-5-milking-sacred-cow.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-2101230840392762566?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/2101230840392762566/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=2101230840392762566' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2101230840392762566'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2101230840392762566'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/05/rule-of-5-riding-wake.html' title='The Rule of 5: Riding the wake'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-6855396275517855820</id><published>2007-05-20T13:32:00.000-07:00</published><updated>2011-01-09T04:27:57.645-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gossip'/><title type='text'>Breaking the rules</title><content type='html'>We pause briefly in our Ro5 review to put the spotlight on those who break rules. We hope that Miss Hilton enjoys her sojourn at the Century Regional Detention Center in Lynwood, Los Angeles. Who amongst us can claim to be totally free of schadenfreude when a misadventure such as this befalls an heiress financially well endowed enough to able to put a real hotel on Robson Street? Surely it will be a cruel and unusual punishment to be denied use of her cell phone while having to wash her hair with standard issue regional detention center shampoo. Of course, as many of our readers will be aware, Big Pharma has &lt;a href="http://www.bu.edu/econ/isp/clip/S01/005.htm"&gt;its very own high profile jailbird&lt;/a&gt;. What a naughty boy!&lt;br /&gt;&lt;br /&gt;Go to jail, go directly to jail, do not pass go, do not collect $200...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-6855396275517855820?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/6855396275517855820/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=6855396275517855820' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6855396275517855820'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/6855396275517855820'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/05/breaking-rules.html' title='Breaking the rules'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5391347244284649092</id><published>2007-05-13T10:36:00.001-07:00</published><updated>2011-01-09T04:28:12.960-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oral drugs'/><category scheme='http://www.blogger.com/atom/ns#' term='rule of 5'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>The Rule of 5</title><content type='html'>&lt;p align="left"&gt;This review of Ro5 follows our &lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5-warm-up.html"&gt;earlier post&lt;/a&gt; on the subject. Ro5 was introduced about a decade ago in &lt;a href="http://dx.doi.org/10.1016/S0169-409X(96)00423-1"&gt;Adv Drug Deliv Rev 23 (1997) 3-25 &lt;/a&gt;which was re-printed in 2001. The article is still essential reading for those working in drug discovery and has proved extremely influential in the pharmaceutical industry. Our view is that while there are sound reasons for avoiding extremes of lipophilicity and molecular size, Ro5's creators do not present much evidence that violating the rule actually leads to lower solubility and permeability.&lt;br /&gt;&lt;br /&gt;The rule of 5 can be states that poor permeation or absorption are likely when:&lt;br /&gt;&lt;br /&gt;There are more than 5 H-bond donors (sum of OH and NH) in the molecule&lt;br /&gt;There are more than 10 H-bond acceptors (sum of N and O) in the molecule&lt;br /&gt;The MWT exceeds 500&lt;br /&gt;Log P exceeds 5 (or MlogP is over 4.15)&lt;br /&gt;Compound classes that are substrates for biological transporters are exceptions to the rule&lt;br /&gt;&lt;br /&gt;Ro5 is based on analysis of a library (USAN) of 2245 orally-dosed drugs likely to have superior physicochemical properties. Cutoffs in the parameters that define Ro5 were set so that about 10% of the drugs in USAN exceeded the cutoff. The observed cutoffs were all found to be all close to 5 or a multiple of 5, leading to the simple mnemonic that the authors called the rule of 5. This approach should be contrasted with the sort of analysis that attempts to classify compounds as soluble/insoluble, druglike/un-druglike, hERG/un-hERG etc using training sets with representatives from each class.&lt;/p&gt;&lt;p align="left"&gt;The creators of Ro5 do make one comparison between the USAN library and the entire WDI data set from which it was selected. They state that molecular weights of the compounds in the 2245 USAN library were lower than those in the complete 50427 WDI data set. The proportions of compounds in the USAN library and full WDI set with set with molecular weights exceeding 500 were 11% and 22% respectively. At the risk of appearing unduly anal, we note that simply observing proportions of two distributions that lie a outside a cutoff does not allow conclusions to be drawn about differences in the mean values for the distributions. An alternative hypothesis could be that means are not significantly different but the variance for the whole WDI data set is greater (we guess that it probably is). Were this the case, a line drawn at a suitable low MWT would suggest that the USAN library was of higher average MWT than the whole WDI data set. The practice of slicing distributions is a commonly employed tactic in medicinal chemistry data analysis and we expect to review specific examples in future posts. &lt;/p&gt;&lt;p align="left"&gt;Ro5 presents two interesting asymmetries. The first is between hydrogen bond donors and acceptors. Are donors inherently more evil than acceptors? Does hydrogen bonding have an even darker side? Does the amide NH, in transit thru the core of the membrane remember that it had earlier been interacting with an oxygen lone pair rather than a water molecule's hydrogen atom? The creators of Ro5 note that "there is far more variation in hydrogen bond acceptor than donor ability across atom types". While we believe this assertion to be essentially correct, it is noted that strong hydrogen bond donors (e.g 4-nitrophenol) are known although these can find themselves excluded from databases like USAN for reasons unconnected with their their ability to participate in hydrogen bonding. The reason for donor acceptor asymmetry is that, as defined by Ro5, the number of donors can at best equal the number of acceptors. Even when more sophisticated definitions of hydrogen bonding are used, donors tend to be less common than acceptors in molecules of pharmaceutical interest. Analysis based on property distributions will necessarily set a lower cutoff for donors and you may wish to point this out tactfully to colleagues who invoke Ro5 in support of a view that donors (as opposed to acceptors) are bad for CNS penetration.&lt;/p&gt;&lt;p align="left"&gt;The second asymmetry is how the extremes of lipophilicity are defined. High lipophilicity is a consequence of poor solvation in the aqueous phase. Lipophilic molecules will do their best to get out of an aqueous environment and escape plans include finding friends (precipitate or at least aggregate promiscuously) and seeking lodging with The Anti-targets Of The Dark Side. Too much lipophilicity is Sinful. However if the drug is too happy (happy drugs?) in water, can it reasonably be expected to slum it in the membrane core? Ro5 eliminates excessively lipophilic drugs with a cutoff in ClogP (calculated octanol water partition coefficient) but the inadequately lipophilic are condemned for their hydrogen bonds. The roots of this asymmetry lie in the hydrogen bonding character of octanol which experimental ease makes the default solvent for partitioning studies. &lt;/p&gt;&lt;p align="left"&gt;One physico-chemical property on which Ro5 has surprisingly little to say is ionisation. Ionisable groups in molecules can greatly increase aqueous solubility but at the cost of reducing the proportion of neutral form that is required for passive transport thru the gut wall. &lt;/p&gt;&lt;p align="left"&gt;Is it fair to call Ro5 a Sacred Cow? It's creators present a useful and pioneering analysis clearly and honestly. Our experience suggests that those who hold the Ro5 most sacred are often those who have not actually looked at the publication. Management Brahmins? It is not for us to comment. &lt;/p&gt;&lt;p align="left"&gt;This brings us to the end of our review of Ro5. In the next two posts on the subject we will highlight some related studies before concluding with a look at what we will call now (and surely regret later) the 'sociological' implications of Ro5. We hope you have found the review useful and encourage you to share your own views and opinions on the subject. &lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5-riding-wake.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5391347244284649092?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5391347244284649092/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5391347244284649092' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5391347244284649092'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5391347244284649092'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/05/rule-of-5.html' title='The Rule of 5'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-3654670678520837781</id><published>2007-05-08T14:22:00.000-07:00</published><updated>2011-01-09T04:28:23.270-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rule of 5'/><category scheme='http://www.blogger.com/atom/ns#' term='literature reviews'/><title type='text'>The Rule of 5: Warm up</title><content type='html'>&lt;div&gt;The Rule of 5 made its entrance about 10 years ago and provided a desperately needed wake up call to a pharmaceutical industry that had been seduced by combinatorial chemistry, high throughput screening and Andersen Consulting (couldn't find hyperlink for them). It is also one of the more sacred cattle in the pasture and will be the subject of our next post. Before we do this, it is appropriate to link material from a couple of others more experienced in the blog trade. &lt;a href="http://orgprepdaily.wordpress.com/about-milkshake/"&gt;Milkshake&lt;/a&gt; sensibly notes that &lt;a href="http://orgprepdaily.wordpress.com/2007/03/11/milkshake-medicinal-wisdom/"&gt;Ro5 is fairly crude but points in the right direction&lt;/a&gt;. &lt;a href="http://totallymedicinal.wordpress.com/about/"&gt;TotallyMedicinal&lt;/a&gt; has compiled a &lt;a href="http://totallymedicinal.wordpress.com/2007/04/23/the-med-chem-literature-pile/"&gt;list of articles from the literature &lt;/a&gt;with the &lt;a href="http://www.sciencedirect.com/science?_ob=ArticleURL&amp;_udi=B6T3R-3RG55KJ-2&amp;amp;_user=10&amp;_coverDate=01%2F15%2F1997&amp;amp;_rdoc=1&amp;_fmt=&amp;amp;_orig=search&amp;_sort=d&amp;amp;view=c&amp;_acct=C000050221&amp;amp;_version=1&amp;_urlVersion=0&amp;amp;_userid=10&amp;amp;md5=23e4c4ad76ecd2a05d4137f353af8061"&gt;Ro5 paper &lt;/a&gt;as the first item, totally in keeping with it's sacred cow status. If you've already blogged about Ro5, make a comment, include the link and be part of the fun!&lt;br /&gt;&lt;br /&gt;On the lighter side, we hope that Britain's Queen Elizabeth II is enjoying her visit to the US and note that Dubya and the Duke of Edinburgh really do deserve each other. Apparently The Queen once met a photographer on a state visit to Canada and the ensuing conversation went something as follows:&lt;br /&gt;&lt;br /&gt;Q: "A photographer! How very interesting, my brother in law &lt;font color="#ff6666"&gt;&lt;em&gt;[The photographer Lord Snowdon was married to Her Majesty's late sister at the time]&lt;/em&gt; &lt;/font&gt;is a photographer".&lt;br /&gt;&lt;br /&gt;P: "Ah, how very coincidental! My brother in law is a queen". &lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gmc2007.blogspot.com/2007/05/rule-of-5.html"&gt;next&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-3654670678520837781?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/3654670678520837781/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=3654670678520837781' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3654670678520837781'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/3654670678520837781'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/05/rule-of-5-warm-up.html' title='The Rule of 5: Warm up'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-2580406253263383964</id><published>2007-05-01T12:01:00.000-07:00</published><updated>2010-09-26T15:07:06.186-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sacred cows'/><category scheme='http://www.blogger.com/atom/ns#' term='rule of 2'/><title type='text'>Lady Bracknell, sacred cattle &amp; fast food</title><content type='html'>We make no claim to the saying that 'sacred cows make great hamburger'. A Google search provided a number of sources including Mark Twain but suprisingly not Oscar Wilde. The latter's seminal contribution to drug discovery is Lady B's observation that one carboxylate may be regarded as unfortunate but two looks like carelessness. May we suggest that this henceforth be known as 'The Rule of Two'.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-2580406253263383964?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/2580406253263383964/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=2580406253263383964' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2580406253263383964'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/2580406253263383964'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/05/lady-bracknell-sacred-cattle-fast-food.html' title='Lady Bracknell, sacred cattle &amp; fast food'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-1344609233489194074</id><published>2007-04-20T21:17:00.000-07:00</published><updated>2010-09-26T15:07:06.188-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sacred cows'/><title type='text'>Sacred cows make great hamburger</title><content type='html'>You might think that drug discovery is based on science. But take a closer look and and you'll find religion everywhere in the pharmaceutical company of the early 21st century. A pantheon of neo-deities including Druglikeness, Drugability, Leadlikeness, Scoring Functions and The Rule of Five keeps the lemming parade contented and unworried. Surely we can look forward to Hitlikeness, Fragmentlikeness, Phase2likeness, Crashed &amp;amp; Burned in Developmentlikeness revealing The True Path to their enlightened followers. Huxley saw it almost a century ago and called it Soma.&lt;br /&gt;&lt;br /&gt;Of course The Dark Side is also with us. The Demons include Darth hERG, The CYPth, Molecular Complexity, Rotatable Bonds, Polar Surface Area and Albumin (who frequently confuses The Faithful by joining the Gods without telling anyone). All the while, self-appointed high priests claim insight into and control over the inner workings of The Dark Side while noisily peddling parametric panaceas.&lt;br /&gt;&lt;br /&gt;To some extent this appeal to Faith is not unexpected. The odds of success in this business are vanishingly small and providing direction for The Great Unwashed becomes tiresome if they are allowed to question the Wisdom of their Leaders. The Second Law is naturally heresy and, despite it's key role in drug action, Entropy is first denied and then demonised once denial has lost its efficacy.&lt;br /&gt;&lt;br /&gt;Gotta go. Did we just hear the crackle of the &lt;em&gt;auto da fe &lt;/em&gt;being kindled?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-1344609233489194074?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/1344609233489194074/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=1344609233489194074' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1344609233489194074'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/1344609233489194074'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/04/sacred-cows-make-great-hamburger.html' title='Sacred cows make great hamburger'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8876332030448981936.post-5165329696044667998</id><published>2007-04-20T13:56:00.001-07:00</published><updated>2007-04-20T13:56:57.854-07:00</updated><title type='text'>Hello world</title><content type='html'>Hello World!?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8876332030448981936-5165329696044667998?l=gmc2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gmc2007.blogspot.com/feeds/5165329696044667998/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8876332030448981936&amp;postID=5165329696044667998' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5165329696044667998'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8876332030448981936/posts/default/5165329696044667998'/><link rel='alternate' type='text/html' href='http://gmc2007.blogspot.com/2007/04/hello-world.html' title='Hello world'/><author><name>Georg-Martin Krapper</name><uri>http://www.blogger.com/profile/15416686863175197568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry></feed>
