We promised to update you on some gossip from work and our loyal readers (all three of them) know that we do like to keep our promises. You’ll remember Group Manager who couldn’t understand why he shouldn’t be listed as an author on everything that any member of the group wanted to publish. Things had come to a head when Top Gun, having done three years as a post-doc in the lab of a Key Opinion Leader on the East Coast, was a bit disgruntled when Group Manager wanted to treat her like a graduate student. The trouble was that Group Manager was actually not a very good manager and his influencing strategy did not extend beyond setting the Caps Lock and letting rip. Relations deteriorated further and Top Gun took a secondment to the Emerging Antiviral Therapies Team and made it quite clear that she wasn’t coming back until They did something about Group Manager’s Stalinist micromanagement. Group Manager’s VP was running out of ideas and in desperation summoned GMC to his office (yes, that desperate!) to see what our slightly unorthodox approach to organisational re-alignment might offer. Here is a transcript of the discussion:
GMC: Well, there is the obvious solution, as they say, ‘pour encourager les autres'.
VP: I’m afraid we can’t do that; he is a manager after all. If we start culling them at that level it’ll only be a matter of time before it gets to the VPs.
GMC: OK, what about the ‘Two Paths, One Mission’ initiative? You can get one of those half-wits in Human Resources to move him from Management to Science with a handful of keystrokes. Make him a Junior Pharma Fellow (JPF) and your problem is solved.
VP: I’m afraid that it’s not that simple.
GMC: How so? Aren’t Group Manager and Junior Pharma Fellow (JPF) equivalent roles in the ‘Two Paths, One Mission’ initiative?
VP: Well the roles are equivalent but the salary scales are different. If you look at salary, Group Manager is actually equivalent to Pharma Fellow (PF).
GMC: So why do you say that Group Manager is an equivalent role to JPF when Group Manager is actually an equivalent role to PF?
VP: Well the management puts an equal value on the Group Manager and JPF roles but market data mean that we have to pay Group Manager more.
GMC: So what exactly is this market data? Who analyses it?
VP: The Human Resources people keep the data and it is highly confidential. Even I don’t get to look at it.
GMC: Well it looks like you’re going to have to convert him to PF. With the market data secure in the HR information black hole, you should be able to do whatever you like and cite data that nobody will ever see in support of your decision.
VP: We probably could but the problem is that going from Group Manager to PF is technically a promotion even if it doesn’t involve a salary increase.
GMC: I see the problem. You need to justify the promotion and you’ll have to tell everyone what a great scientist he is when he hasn’t been corresponding author on a journal publication since 2001.
VP: Precisely!
GMC: Well let’s see what we might use. Didn’t he help organise some conference and didn’t some university give him an Honorary Professorship?
VP: I’m not sure about the Honorary Professorship. He actually asked one of his friends there if they could sort it and things like this are really pretty worthless these days. You can buy them by giving somebody a juicy slot at a conference that you’re organising. It really is that simple and cheap!
GMC: OK why don’t we say that he’s a Key Opinion Leader (KOL)?
VP: Nice idea but there would be problems because we're already calling Senior Pharma Fellow (SPF) a Key Opinion Leader. I mean they can’t both be KOLs because SPF would throw a hissy fit. You know what his ego is like.
GMC: OK let’s call SPF ‘Thought Leader’. Wouldn’t that be tidy? Then you can call PF a Key Opinion Leader without offending SPF.
VP: What an excellent idea! I must have thought of it myself. But we still need to create a role for PF.
GMC: That shouldn’t be a problem. You can say he’s providing leadership for the JPFs.
VP: Are you sure? The JPFs are a particularly tiresome group and they’re unlikely to fall for the Key Opinion Leader farce. They’re also a lot stronger scientifically than PF so there could be real problems.
GMC: Well you didn’t like our first suggestion so I think this is all you can do. He doesn’t need to actually provide leadership for the JPFs. You just need to say that he’s providing leadership and the organisational inertia will do the rest. How about suggesting that he get them to write up some research proposals. That’ll create an illusion of leadership.
VP: Not so sure about the research proposal idea. I mean there’s no resource for that sort of thing.
GMC: The lack of resource is exactly why it is a good idea. If those meddlesome JPFs are continually writing proposals for projects that will never be resourced, they won’t have the time to create trouble.
VP: What a masterstroke! I almost hadn’t realised that I'd thought of it before. But I have one last question. I’m concerned that appointing Group Manager as PF and calling him a KOL will lose me respect among the scientific community.
GMC: That’s one thing that you don’t need to worry about.
VP: How can you be sure?
GMC: The scientific community in this company stopped taking you seriously years ago.
Any similarity between the characters in this Crapshoot and persons alive or dead is entirely coincidental. No children, animals, VPs, Group Managers, Pharma Fellows or Senior Pharma Fellows were harmed in the preparation of this Crapshoot.
Sunday, November 29, 2009
Tuesday, October 27, 2009
Is promiscuity categorically sinful?
We thought that a categorical sin would represent an excellent way to return to shooting the crap as we like to say. This particular categorical sin is connected with promiscuity although it is not clear whether the latter should be regarded as vice or virtue. We will return to that question at a later date but for the present we simply invite you fasten your seatbelt, sit back and wallow in the sheer, undiluted sinfulness of it all.
Take a look at today’s featured article but don’t bother to read it if you’re in a hurry. Just go to Figure 2 because that’s where all the action happens. This figure claims to illustrate the relationship between promiscuity and molecular weight . Promiscuity is defined by the number of targets that the compound inhibits with an IC50 of less than 10 micromolar. As an aside it should be mentioned that 10 micromolar inhibition in an in vitro assay does not necessarily translate into in vivo inhibition. You need to know blood levels to answer that question. More precisely free blood levels but we’re not going to there today because it’s an even worse place than the Laotian monastery!
Anyway back to Figure 2. There are a number of similarities between this on and the one illustrating the relationship between promiscuity and lipophilicity that starred in an earlier Crapshoot. Promiscuity is an integer and so all you need to do is calculate the average molecular weight for each value of promiscuity and you’re ready to plot. Isn’t Key Opinion Leadership easy! Well these guys did the plot and got an R-square of 0.93. Does this mean that you’d get an R-square of 0.93 for the raw data? We suspect not.
Our Loyal Readers know only too well by now what makes us a little queasy about plots like this. Very simply, variation is hidden and for those of you who’ve joined us we’ll try to explain. Take a look at the point that has been plotted for compounds hitting just one target. The mean molecular weight for these compounds is about 430 Da and for the sake of the discussion let’s just say that the mean is exactly 430 Da. You could get this value if all these compounds have a molecular weight (MW) of 430 Da but you’d get the same value if half the compounds had MW of 230 Da and if the other half had MW of 630 Da.
Figure 2 is a plot of the trend in the data and not the data itself and we’re not sure what the R-square for the trend in the data really means. We have already noted that the R-square that you get from treating the data in this manner depends on the number of levels of promiscuity. How can you make this statement, M. le Crapshoot, without even looking at the data, we hear you cry. Patience, Esteemed Readers, you really should have read your back copies of The Crapshoot more carefully. You’ll see that the largest number of assays in which compounds are active is 18. Let’s call compounds that hit 1-9 assays less promiscuous and assign them an integer of 1. The other compounds we’ll call really promiscuous (we’ll also call the vice squad) and assign them an integer of 2. Now plot the average value of any parameter or property that you like for each group of compound s against the integer that you’ve assigned the compounds to and we suspect that you’ll get an R-square of 1. If not it’s going to be 0. This is but one of the manifestations of categorical sin.
Now you may think we’re being a bit harsh on the nice folk named in the footnote to Figure 2. After all they are honest enough to state that the standard deviation for each promiscuity value is high and they also show that the highly promiscuous compounds are much less numerous than the less promiscuous compounds. Figure 2 would have been greatly improved by displaying these standard deviations and Our Loyal Readers know only too well that it is the standard deviations which must be shown and not the standard errors. That would be truly sinful and we hope that Sensitive Readers will not have been offended by all this talk of promiscuity and standard deviants.
So what’s to be done about Figure 2? Firstly we should point out that one justification for plotting the data in this manner is that the creators of Figure 2 appear to be trying to explore the response of promiscuity to molecular weight. Most of the compounds in the data set are relatively non-promiscuous and would dominate if all data points were used. There are a couple of options for dealing with this problem. Firstly you could simply show the standard deviation for each promiscuity level. A second option would be to create a new data set by randomly selecting a fixed number of compounds for each promiscuity level. This new data set would be a lot more suitable for regression analysis and you could also set molecular weight to be the independent variable which is more appropriate if you’re thinking of promiscuity as a response to molecular weight. If we were going down this route we would also include compounds that don’t show activity in any assays.
So there you have it. As categorical sins go, this one is actually not too sinful and shouldn’t result in anything more than a transient stay in data analytic purgatory for hiding variation.
Take a look at today’s featured article but don’t bother to read it if you’re in a hurry. Just go to Figure 2 because that’s where all the action happens. This figure claims to illustrate the relationship between promiscuity and molecular weight . Promiscuity is defined by the number of targets that the compound inhibits with an IC50 of less than 10 micromolar. As an aside it should be mentioned that 10 micromolar inhibition in an in vitro assay does not necessarily translate into in vivo inhibition. You need to know blood levels to answer that question. More precisely free blood levels but we’re not going to there today because it’s an even worse place than the Laotian monastery!
Anyway back to Figure 2. There are a number of similarities between this on and the one illustrating the relationship between promiscuity and lipophilicity that starred in an earlier Crapshoot. Promiscuity is an integer and so all you need to do is calculate the average molecular weight for each value of promiscuity and you’re ready to plot. Isn’t Key Opinion Leadership easy! Well these guys did the plot and got an R-square of 0.93. Does this mean that you’d get an R-square of 0.93 for the raw data? We suspect not.
Our Loyal Readers know only too well by now what makes us a little queasy about plots like this. Very simply, variation is hidden and for those of you who’ve joined us we’ll try to explain. Take a look at the point that has been plotted for compounds hitting just one target. The mean molecular weight for these compounds is about 430 Da and for the sake of the discussion let’s just say that the mean is exactly 430 Da. You could get this value if all these compounds have a molecular weight (MW) of 430 Da but you’d get the same value if half the compounds had MW of 230 Da and if the other half had MW of 630 Da.
Figure 2 is a plot of the trend in the data and not the data itself and we’re not sure what the R-square for the trend in the data really means. We have already noted that the R-square that you get from treating the data in this manner depends on the number of levels of promiscuity. How can you make this statement, M. le Crapshoot, without even looking at the data, we hear you cry. Patience, Esteemed Readers, you really should have read your back copies of The Crapshoot more carefully. You’ll see that the largest number of assays in which compounds are active is 18. Let’s call compounds that hit 1-9 assays less promiscuous and assign them an integer of 1. The other compounds we’ll call really promiscuous (we’ll also call the vice squad) and assign them an integer of 2. Now plot the average value of any parameter or property that you like for each group of compound s against the integer that you’ve assigned the compounds to and we suspect that you’ll get an R-square of 1. If not it’s going to be 0. This is but one of the manifestations of categorical sin.
Now you may think we’re being a bit harsh on the nice folk named in the footnote to Figure 2. After all they are honest enough to state that the standard deviation for each promiscuity value is high and they also show that the highly promiscuous compounds are much less numerous than the less promiscuous compounds. Figure 2 would have been greatly improved by displaying these standard deviations and Our Loyal Readers know only too well that it is the standard deviations which must be shown and not the standard errors. That would be truly sinful and we hope that Sensitive Readers will not have been offended by all this talk of promiscuity and standard deviants.
So what’s to be done about Figure 2? Firstly we should point out that one justification for plotting the data in this manner is that the creators of Figure 2 appear to be trying to explore the response of promiscuity to molecular weight. Most of the compounds in the data set are relatively non-promiscuous and would dominate if all data points were used. There are a couple of options for dealing with this problem. Firstly you could simply show the standard deviation for each promiscuity level. A second option would be to create a new data set by randomly selecting a fixed number of compounds for each promiscuity level. This new data set would be a lot more suitable for regression analysis and you could also set molecular weight to be the independent variable which is more appropriate if you’re thinking of promiscuity as a response to molecular weight. If we were going down this route we would also include compounds that don’t show activity in any assays.
So there you have it. As categorical sins go, this one is actually not too sinful and shouldn’t result in anything more than a transient stay in data analytic purgatory for hiding variation.
Tuesday, October 6, 2009
Summer is over and The Crapshoot is back!
Summer recess is over and having been goaded into action by one of our readers (whom we believe to number about half a dozen), it is time to re-start shooting the crap as we like to say. We have been spending the last few months in a Laotian monastery where you will not find internet, telephones or (most importantly) white-coated orderlies. Needless to say, this environment is not conducive to blogging or in fact compliance with our medication.
We will return to the gratuitous bashing of predictive modelling in due course but will first share (in the next post) a most naughty categorical sin. This has inspired us to create a ‘categorical sin’ label so that Loyal and Discerning Readers of The Crapshoot can indulge in sins of this nature more easily. There is also plenty of juicy gossip to catch up on because Group Manager has been ‘promoted’ to Pharma Fellow and his manager is desperately trying to re-package him as a Key Opinion Leader to the great amusement of those whose opinions he is supposed to be leading. Even, or should we say especially, the janitorial staff are enjoying the joke. Stay tuned!
We will return to the gratuitous bashing of predictive modelling in due course but will first share (in the next post) a most naughty categorical sin. This has inspired us to create a ‘categorical sin’ label so that Loyal and Discerning Readers of The Crapshoot can indulge in sins of this nature more easily. There is also plenty of juicy gossip to catch up on because Group Manager has been ‘promoted’ to Pharma Fellow and his manager is desperately trying to re-package him as a Key Opinion Leader to the great amusement of those whose opinions he is supposed to be leading. Even, or should we say especially, the janitorial staff are enjoying the joke. Stay tuned!
Monday, April 20, 2009
Another year, another Senior Pharma Fellow
It is now 2 years since we started The Crapshoot and 58 posts and 20k pageloads later it has not been put out of your misery. The second year has been less eventful than the first in that we received no death threats, something that greatly disappoints us. This year marked the debut of one of our favorite characters who we have named Senior Pharma Fellow although you will know him by any of a number of names that tact prevents us from mentioning.
Wednesday, April 1, 2009
The latent indicator variable 2
The toy example in the previous post is clearly a bit of an over-simplification although it is useful for illustration of some ideas. With only two substituents, it should be pretty obvious to all but the most witless when compounds with one substituent are more active than the corresponding compounds with the other substituent.
Things get a bit more complicated when you have a number of substituents. Time for another of The Crapshoot’s annoying toy examples, for which we make no apology. If you find reading this garbage to be a painful experience then please spare a thought for those of us who have to write it.
Suppose you can now have one of 5 substituents at a particular position instead of just chlorine and the ‘un-substituent’ hydrogen. Let’s also assume classic Free-Wilson linearity-additivity in the SAR such that each substituent makes a constant (and different) contribution to activity. Although this is a rather contrived system it is not too different from the situation that exists in MedChem projects where a well-defined ranking of substituents is observed that is independent of what may be present at other positions of diversity in the molecule. If we’ve got 5 compounds each with a different one of these 5 subsitituents you should be able to fit whatever biological activity you observe using 5 different substituent parameters, provided that each has different values for each substituent. For example you might use sigma meta, sigma para, sigma resonance, sigma inductive, volume, cube root of the trace of the substituent polarizability tensor, ad nauseum. The key point is that it just doesn’t matter as long as that each parameter has different values for each subsituent. This is the curse of the Latent Indicator Variable.
Now 5 adjustable parameters and 5 compounds would really look rather like over-fitting. But suppose we’ve done this combinatorially and have another position (let’s call it B) of diversity at which we can have one of 10 substituents. Now there are 10 compounds with each one of the 5 original substituents (let’s call these the position A substituents). Now here’s the fun bit and don’t worry because we’ll hold your hand so we can do it together. We’re going to take the average pIC50 for compounds with each of the 5 position A subsituents. Provided that these averages are all sufficiently different, you’ll get some sort of model when you use all the data points. And when you use all 50 data points, using 5 adjustable parameters doesn’t look quite so naughty.
The problem is that we’ve used Latent Indicator Variables and, even with 50 data points, this model only works if a compound contains one of the 5 position A substituents that we’ve used to train the model. Unfortunately the situation is a less easy to spot than when we’ve only got two substituents to worry about. A compound might sit right at the centroid of the model space and the unwary would say this was interpolation. Yes, if you’re using one of the 5 position A substituents used to train this model but otherwise No.
This is probably a good point at which to sign off. There were so many things we wanted to talk about like correlations between descriptors, why it doesn’t really make sense to use Hammett constants to model biomolecular recognition and the dangers to Civilisation poised by structural clusters in training sets. However, enough is enough and we’ll leave you with a problem that anyone who has done some ten pin bowling will be familiar with. Your first ball has knocked down all the pins except two. Anyone care to guess which two? In case, you’ve not figured it out, the two balls are numbers 7 and 10. That’s why they call it a 7-10 split! They sit at opposite ends of the back row and the centroid of the model space is not going to be a whole lot of help now.
next
Things get a bit more complicated when you have a number of substituents. Time for another of The Crapshoot’s annoying toy examples, for which we make no apology. If you find reading this garbage to be a painful experience then please spare a thought for those of us who have to write it.
Suppose you can now have one of 5 substituents at a particular position instead of just chlorine and the ‘un-substituent’ hydrogen. Let’s also assume classic Free-Wilson linearity-additivity in the SAR such that each substituent makes a constant (and different) contribution to activity. Although this is a rather contrived system it is not too different from the situation that exists in MedChem projects where a well-defined ranking of substituents is observed that is independent of what may be present at other positions of diversity in the molecule. If we’ve got 5 compounds each with a different one of these 5 subsitituents you should be able to fit whatever biological activity you observe using 5 different substituent parameters, provided that each has different values for each substituent. For example you might use sigma meta, sigma para, sigma resonance, sigma inductive, volume, cube root of the trace of the substituent polarizability tensor, ad nauseum. The key point is that it just doesn’t matter as long as that each parameter has different values for each subsituent. This is the curse of the Latent Indicator Variable.
Now 5 adjustable parameters and 5 compounds would really look rather like over-fitting. But suppose we’ve done this combinatorially and have another position (let’s call it B) of diversity at which we can have one of 10 substituents. Now there are 10 compounds with each one of the 5 original substituents (let’s call these the position A substituents). Now here’s the fun bit and don’t worry because we’ll hold your hand so we can do it together. We’re going to take the average pIC50 for compounds with each of the 5 position A subsituents. Provided that these averages are all sufficiently different, you’ll get some sort of model when you use all the data points. And when you use all 50 data points, using 5 adjustable parameters doesn’t look quite so naughty.
The problem is that we’ve used Latent Indicator Variables and, even with 50 data points, this model only works if a compound contains one of the 5 position A substituents that we’ve used to train the model. Unfortunately the situation is a less easy to spot than when we’ve only got two substituents to worry about. A compound might sit right at the centroid of the model space and the unwary would say this was interpolation. Yes, if you’re using one of the 5 position A substituents used to train this model but otherwise No.
This is probably a good point at which to sign off. There were so many things we wanted to talk about like correlations between descriptors, why it doesn’t really make sense to use Hammett constants to model biomolecular recognition and the dangers to Civilisation poised by structural clusters in training sets. However, enough is enough and we’ll leave you with a problem that anyone who has done some ten pin bowling will be familiar with. Your first ball has knocked down all the pins except two. Anyone care to guess which two? In case, you’ve not figured it out, the two balls are numbers 7 and 10. That’s why they call it a 7-10 split! They sit at opposite ends of the back row and the centroid of the model space is not going to be a whole lot of help now.
next
Saturday, February 28, 2009
The latent indicator variable 1
Well it does seem a while since we last posted and there is still much work to do as we continue from the previous post. The situation in which you either have chlorine or hydrogen at C4 of the phenyl should be easy to spot using any of a number of substituent parameters and comparing average pIC50 values for the two groups of compounds will give you a good idea of whether or not substitution with chloro is good for activity. If substitution with chloro at C4 leads to a consistent increase in potency, you’ll get model that is both predictive and that can be validated. So exactly what is your point, we hear you cry.
OK let’s be a bit more specific. We’ll use the Wikipedia as our source of Hammett sigma constants. The Hammett sigma constant for meta-chloro is +0.37 and (by definition) that for hydrogen is zero. If chloro substitution leads to a significant increase in potency you should get a reasonable model by fitting pIC50 to sigma. It will satisfy validation criteria and Senior Pharma Fellow (SPF) will be able to rattle off an impressive array of quality control metrics in his next presentation. Aren’t we clever! Surely it’s time to use the model to do some predicting.
Our chemists want to know what happens if we introduce methoxy or fluoro at C4. Actually they don’t like Senior Pharma Fellow (SPF) any more than we do but there is a directive from the Project Management Politburo that these models are to be used even if they are not believed. Furthermore you need to run the model so that you can to tick the relevant boxes on the Authorisation For Synthesis form that the tiresome Black-Belted Half-Wits have set up for the gathering of Base-line Productivity Indicators. At least we know that we won’t be extrapolating because the Hammett sigma values for meta-methoxy and meta-fluoro are +0.11 and +0.34 respectively so both lie within the space spanned by the training set. We’d predict that replacing chloro at C4 with fluoro would to lead to a small drop in potency because the relevant Hammett sigma values are so similar. We’d be particularly confident in our predictions for the methoxy-substituted analogs because this represents interpolation to a greater extent than if we were doing predictions for the compounds with which the model was built.
Now for the sake of argument, let’s suppose we’d decided to use the Hammett constants for these substituents at the para position. The value for chlorine is now +0.23 and that for hydrogen is still zero (by definition) as before so the quality of the model. However fluoro (sigma-para = +0.06) looks much more like hydrogen than chloro while methoxy (sigma-para = -0.27) now lies well outside the space spanned by the training set. Needless to say this is a very different picture to what we saw using sigma-meta values.
What does this all mean? This is obviously a toy example that we’ve created to illustrate a point. However it is clear that if we’re building models using pIC50s for compounds that are either unsubstituted or have chloro at C4 then sigma-para will work just as well as sigma-meta. The sigma values function as indicator variables and any parameter which has different values for chloro and hydrogen substituents will do the job just as well. The problem is that for these models having anything other than hydrogen or chloro at C4 represents an extrapolation while the continuous nature of sigma constants suggests that we might be interpolating. Real models are typically a lot more complex than this toy example and it is often not clear when linear combinations of continuous variables are actually functioning as indicator variables. We’ll pick up in the next post since it is getting late and there is cider to be drunk. It should be fun and hopefully we will not encounter a latent indicator variable (LIV).
next
OK let’s be a bit more specific. We’ll use the Wikipedia as our source of Hammett sigma constants. The Hammett sigma constant for meta-chloro is +0.37 and (by definition) that for hydrogen is zero. If chloro substitution leads to a significant increase in potency you should get a reasonable model by fitting pIC50 to sigma. It will satisfy validation criteria and Senior Pharma Fellow (SPF) will be able to rattle off an impressive array of quality control metrics in his next presentation. Aren’t we clever! Surely it’s time to use the model to do some predicting.
Our chemists want to know what happens if we introduce methoxy or fluoro at C4. Actually they don’t like Senior Pharma Fellow (SPF) any more than we do but there is a directive from the Project Management Politburo that these models are to be used even if they are not believed. Furthermore you need to run the model so that you can to tick the relevant boxes on the Authorisation For Synthesis form that the tiresome Black-Belted Half-Wits have set up for the gathering of Base-line Productivity Indicators. At least we know that we won’t be extrapolating because the Hammett sigma values for meta-methoxy and meta-fluoro are +0.11 and +0.34 respectively so both lie within the space spanned by the training set. We’d predict that replacing chloro at C4 with fluoro would to lead to a small drop in potency because the relevant Hammett sigma values are so similar. We’d be particularly confident in our predictions for the methoxy-substituted analogs because this represents interpolation to a greater extent than if we were doing predictions for the compounds with which the model was built.
Now for the sake of argument, let’s suppose we’d decided to use the Hammett constants for these substituents at the para position. The value for chlorine is now +0.23 and that for hydrogen is still zero (by definition) as before so the quality of the model. However fluoro (sigma-para = +0.06) looks much more like hydrogen than chloro while methoxy (sigma-para = -0.27) now lies well outside the space spanned by the training set. Needless to say this is a very different picture to what we saw using sigma-meta values.
What does this all mean? This is obviously a toy example that we’ve created to illustrate a point. However it is clear that if we’re building models using pIC50s for compounds that are either unsubstituted or have chloro at C4 then sigma-para will work just as well as sigma-meta. The sigma values function as indicator variables and any parameter which has different values for chloro and hydrogen substituents will do the job just as well. The problem is that for these models having anything other than hydrogen or chloro at C4 represents an extrapolation while the continuous nature of sigma constants suggests that we might be interpolating. Real models are typically a lot more complex than this toy example and it is often not clear when linear combinations of continuous variables are actually functioning as indicator variables. We’ll pick up in the next post since it is getting late and there is cider to be drunk. It should be fun and hopefully we will not encounter a latent indicator variable (LIV).
next
Sunday, January 25, 2009
Islands in the chemical ocean
We left you rather abruptly in the previous post, having been stung by your suggestion that we might be uncouth. However, we have decided to forgive you and continue with our tale.
We'll start with a scenario with which many of our loyal and patient readers will be familiar. You're optimising a series and have found that adding a chloro substituent at C4 of one of the phenyl rings increases the pIC50 (-log IC50 in concentration units of mol/litre) by a unit regardless of what substituents are present at C3 and C5. Those of you who've worked in drug discovery will have seen this sort of thing. Everybody in the project knows that the 4-chloro substituent is good for potency and if it goes the potency has to be clawed back from somewhere else. Just like tax.
This sort of thinking is the basis of Free-Wilson analysis. The C4 chlorine and the hydrogen of the unsubstituted C4 can each be thought of as contributing to potency. The contribution of the chlorine is a log unit greater than that of hydrogen. So you've recognised this pattern in your project data but this isn't good enough. What do you mean, "not good enough". You have quite some nerve, M. le Crapshoot. Nothing to do with us. The Chemistry Discipline Review Committee have decided that they'd really prefer that you did this sort of thing with some equations rather than this uncultured chemical structure stuff. Also Senior Pharma Fellow (SPF) needs some equations for the presentation slides that his secretary is preparing for him. Can't you just generate some predictive models instead of being so difficult.
Well you didn't handle that very well, did you? Anyway stop complaining because you've got work to do. You do some modelling and you find out the Hammett sigmas (both meta and para) for the C4 substituent are both useful predictors of pIC50 as are the substituent hydrophobicity parameter and the molar mass of the substituent. Then you make a startling discovery.
The molecules with which you're building the models either have chlorine at C4 or are unsubstituted at this position.
next
We'll start with a scenario with which many of our loyal and patient readers will be familiar. You're optimising a series and have found that adding a chloro substituent at C4 of one of the phenyl rings increases the pIC50 (-log IC50 in concentration units of mol/litre) by a unit regardless of what substituents are present at C3 and C5. Those of you who've worked in drug discovery will have seen this sort of thing. Everybody in the project knows that the 4-chloro substituent is good for potency and if it goes the potency has to be clawed back from somewhere else. Just like tax.
This sort of thinking is the basis of Free-Wilson analysis. The C4 chlorine and the hydrogen of the unsubstituted C4 can each be thought of as contributing to potency. The contribution of the chlorine is a log unit greater than that of hydrogen. So you've recognised this pattern in your project data but this isn't good enough. What do you mean, "not good enough". You have quite some nerve, M. le Crapshoot. Nothing to do with us. The Chemistry Discipline Review Committee have decided that they'd really prefer that you did this sort of thing with some equations rather than this uncultured chemical structure stuff. Also Senior Pharma Fellow (SPF) needs some equations for the presentation slides that his secretary is preparing for him. Can't you just generate some predictive models instead of being so difficult.
Well you didn't handle that very well, did you? Anyway stop complaining because you've got work to do. You do some modelling and you find out the Hammett sigmas (both meta and para) for the C4 substituent are both useful predictors of pIC50 as are the substituent hydrophobicity parameter and the molar mass of the substituent. Then you make a startling discovery.
The molecules with which you're building the models either have chlorine at C4 or are unsubstituted at this position.
next
Subscribe to:
Posts (Atom)