Genetic Future

i-d69ea4c0b321e75a8523555d0aa47a08-depression.jpgNew Scientist trumpets the discovery of “the first placebo gene“. The study in question is here.

I usually don’t comment on this type of study, but this time the hype is just too much for me: New Scientist describes the study as “a milestone in the quest to understand” the placebo effect; an article in ScienceNow quotes a psychiatrist saying that “the findings could have major implications for research design”. The article itself certainly doesn’t talk down its results, with the first sentence of the discussion stating:

The present study demonstrates that the magnitude of the placebo response [...] is tied to attenuated amygdala excitability, which in turn is linked to serotonergic genetic variation.

The problem? The study examined just 25 subjects, and if there’s one clear lesson from the history of candidate gene asociation studies it’s that such tiny studies are essentially worthless: systematic reviews of the field (e.g. here, here and here) have consistently found that the majority of such associations are never replicated, suggesting that positive results in small studies are substantially more likely to arise through a combination of chance, error and publication bias than through a genuine causal link.

It’s only relatively recently that genetic association studies have come of age, with the advent of agnostic genome-wide association studies, massive sample sizes, rigorous statistical frameworks and the use of independent replication cohorts. Unfortunately, it appears that such novelties haven’t yet permeated Uppsala University’s Department of Psychology – but that hasn’t stopped their study from generating media attention, in publications that should really have known better.

So don’t believe the hype: as a good rule of thumb, if a genetic association study contains fewer than 100 subjects, it’s not a “milestone” with “major implications” – in fact, you might as well simply pretend it doesn’t exist at all. (Many studies with more than 100 subjects are also crap, but at least there’s a chance they’re capturing a genuine causal variant.) I’m deadly serious about this. The field is so littered with the stinking carcasses of unreplicated candidate gene associations that it’s a reasonable default to simply assume that any small, unreplicated study is false.

Now, if only there was a way to get the science journalists responsible to internalise that little rule of thumb…

Subscribe to Genetic Future.

Comments

  1. #1 Don Monroe
    December 3, 2008

    Thanks for the perspective. Your rule of thumb will be useful to me as a science journalist.

    Question: would you disregard a study larger than 100 if there is no independent replication? Or would you consider a small study worthy of some attention if it were replicated?

  2. #2 Andrew Yates
    December 3, 2008

    We’re at the apex of a hype cycle. Everybody is buying hype, and the prosaic truth is drowned out in the noise.

    I estimate that we’re about 3 to 6 months away from the flagstaff “Has Genomics gone too far?” headline, and we’ll see a nasty collapse in public opinion when deCODE and Navigenics report bad news probably this year.

    Everybody will be buying the truth then. So, keep faith! The Internet makes an track record of honesty (or otherwise) obvious.

  3. #3 Andrew Yates
    December 3, 2008

    “probably this year”

    err, that is, 2009.

  4. #4 Gregory Earl
    December 4, 2008

    Unfortunately, it appears that such novelties haven’t yet permeated Uppsala University’s Department of Psychology …

    Or Uppsala Imanet, GE Healthcare, the Department of Pharmacology at Goteborg University, the GlaxoSmithKline Medicine Research Centre in Verona, Italy, Quintiles AB Phase I Services in Uppsala, the Department of Neuroscience at Uppsala University and the Department of Biochemistry and Organic Chemistry at Uppsala University.

    Why pick on the affiliation of the first author and let the others get away?

  5. #5 Daniel MacArthur
    December 4, 2008

    The lead (and corresponding) author, the senior author and half of the middle authors are from the Department of Psychology – I figure it’s reasonable to assign them the bulk of the responsibility.

  6. #6 Sciphu
    December 4, 2008

    Daniel. Your criticism is ok, one shouldn’t be drawing conclusions before larger studies are performed and results reproduced. That said, in another news posting I have read a study-population size of 108, I couldn’t get the full text article, but if this is correct, then their study isn’t as bad as you say it is.

  7. #7 Neuroskeptic
    December 4, 2008

    Science journalists should note that the 100 figure is completely arbitrary (as I’m sure Daniel would agree) – whether a study is big enough depends completely upon what the study is. This was an fMRI study which means that getting an n of 100 would have been prohibitively expensive but, more importantly, un-necessary if the gene’s effect on brain function was substantial enough.

    I’m not defending this study, and I will probably blog about myself, but sample size isn’t everything. I strongly suspect it isn’t the main problem with this study

  8. #8 Daniel MacArthur
    December 4, 2008

    Sciphu,

    There were 108 patients in the original study, but this report is a targeted analysis of the 25 patients assigned to the placebo group – comparing 10 “responders” to 15 “non-responders”. So the sample size for this specific association is 25, not 108.

  9. #9 Daniel MacArthur
    December 4, 2008

    Don,

    The 100 figure is just my arbitrary rule of thumb below which a study should be regarded as completely invisible. Above that magic number the weight I would give to a finding depends hugely on the methodology used. For instance, arbitrary sub-division of a cohort into smaller groups for the study raises loud alarm bells, as does failure to correct for multiple testing (which appears to be a problem with this study as well), or to adequately address the possibility of population structure.

    Independent replication is extremely important (which is why most major journals effectively require it before an association study is accepted), but it has to be done well: the replication cohort needs to reasonably well-powered, the same variant must be tested, and the effect has to be in the same direction as in the original study. Lack of replication isn’t enough to make me completely disregard a study, but I don’t actually start to believe it until I’ve seen a good replication study (I should note that I apply this to my own work as well – I didn’t really believe the association between ACTN3 and sprint performance until after the first two replications).

    The problem is basically publication bias: positive studies get published (and publicised), and negative studies are either never submitted or don’t get past the editors. That creates a strong incentive for researchers to slice and dice their data in various ways until they find something that will make a headline. This happens for large studies as well as small ones, but it’s a hell of a lot easier to stumble across a positive finding by chance if your sample size is low.

    John Ioannidis has written a series of excellent articles on bias in general in the scientific literature, with a particular focus on genetic association studies – PubMed “ioannidis jp” for details.

  10. #10 Daniel MacArthur
    December 4, 2008

    Neuroskeptic,

    Sure, every study is limited by the samples available. I’ve got no problem with the researchers throwing a few genetic assays at their subjects to see if anything interesting fell out. What I do have a problem with is the notion that this study actually demonstrates anything, when in fact at best it is a slight hint of something that might be worth pursuing in a larger, better-designed study down the road. It should have been a line or two in the results section, not the title of the paper!

    It’s also worth noting that the authors could probably have found samples very easily for a replication study for the placebo-gene link (but admittedly not for the PET scan-gene link). How many studies out there have collected DNA from samples used in a placebo-controlled trial? How hard would it have been to find one sufficiently large cohort to validate the association between placebo response and this particular polymorphism? Sure, it would require establishing a collaboration and doing a bunch of genotyping, but that’s all.

    Would it have been worth the effort? That depends on whether the authors are interested in figuring out whether this association is actually real, or if they just wanted their names in New Scientist.

  11. #11 Daniel MacArthur
    December 4, 2008

    One final point. Neuroskeptic, you said:

    …getting an n of 100 would have been prohibitively expensive but, more importantly, un-necessary if the gene’s effect on brain function was substantial enough.

    Leaving aside the issue above regarding the ease with which the placebo response association could have been replicated, let me ask you this: given what is now known about the genetic complexity of most human traits, what would you estimate to be the prior probability that a single polymorphism explains 30% of the variance in placebo response (as this study claims, and indeed as would probably be required for this study to show significance with such a tiny sample size)?

  12. #12 Blake Stacey
    December 4, 2008

    I usually don’t comment on this type of study, but this time the hype is just too much for me: New Scientist describes the study

    Ah, say no more. We’ve been down that road.

  13. #13 Sciphu
    December 4, 2008

    Daniel. Yes, I read the eSciencenews-report where only the 108 was mentioned, i see from the sciencenow piece that they are indeed reporting on the placebo subgroup. But, and please correct me if I am completely wrong here, but if 40 % is positive, you do not need a very large cohort to get a good positive prediction value. If so, is 25 still too little ?

  14. #14 Daniel MacArthur
    December 4, 2008

    Blake,

    Your link filled me with soul-crushing despair for the future of the non-scientist’s view of science. Thanks!

    Sciphu,

    40% of 25 people = 10 people (and in fact one of the responders didn’t have a DNA sample available, so the real sample size was 9 responders and 15 non-responders).

    That’s (just) large enough to achieve formal statistical significance – the final P value was 0.04, meaning that the same result would have been obtained 4 out of 100 identical trials by chance alone (it’s unclear to me whether or not the authors corrected for multiple testing of four predictive variables; if not, that would raise the probability of obtaining a P value of 0.04 for at least one variable to 16 out of 100 trials).

    I’ll grant you that it’s possible that the association is genuine, but here’s what I see as the default explanation. We know that the authors took these samples from two somewhat larger studies looking at genotype, PET and clinical response to drugs in patients with seasonal affective disorder [added in edit: should have been social anxiety disorder] performed back in 2003-2005. They presumably tested a whole bunch of different hypotheses: whether genotype affects PET activity in a whole bunch of different areas of the brain, whether genotype affects response to drugs, whether PET activity correlates with drug response, whether males differ from females, young from old, etc. With each of these hypotheses the probability that one of them would provide a statistically significant result purely by chance increased. In this case, the correlation between one particular genetic polymorphism and the placebo response was the one that came up (through a combination of chance, error and bias), and that’s the association that got published. Hey presto: headlines, glory, and groupies for the authors.

    Sound cynical? Look at the data. If formal statistical significance was an appropriate guide to whether or not an association was real, and if scientists only ever did studies that were adequately powered to detect the effects they were interested in, then you’d expect to see failure of replication for only a small fraction of reported genetic associations – but that’s not the case. Instead, the majority of candidate gene association studies are not convincingly replicated (96%, in the survey I just linked to).

    The numbers are so dismal that I feel quite confident ignoring studies with sample sizes below 100 – they’re simply not much better than random noise at identifying genuine associations. If that means ignoring the occasional real finding (guilt by association!), so be it – I’m happy to be proved wrong by the replication studies, but that really doesn’t happen very often.

  15. #15 gillt
    December 4, 2008

    I’m busily searching for candidate genes for specific tumor types and am having a hard time. I know that this is only confounded when the phenotype happens to be a complex behavior such as sociopathy.

    I often wonder, if you’re working on behavioral genetics sans cell culture or sequencer…then what are you really doing?

  16. #16 Techskeptic
    December 4, 2008

    Cant we somehow grade studies based on the mechanics?

    It drives me crazy that non-blinded studies with participants as low as 12 subjects makes the news. If the mechanics of a study got a grade, more people would understand that a study is C grade (or some such thing) rather than a useful A or B grade.

  17. #17 c
    December 4, 2008

    I think you all missed the fact that it is not just an association between placeboresponce and genotype, but that it is mediated by amygdala blood flow. So the story is a little more coherent than a mere association study.

  18. #18 Daniel MacArthur
    December 4, 2008

    c,

    Sure, it’s a nice, coherent story – that’s why it got published. That doesn’t make it any more likely to be true. The functional brain imaging literature is even more jam-packed with small, completely unreplicated (but presumably coherent) association studies than the genetic literature.

  19. #19 Mark Pallen
    December 5, 2008

    This reminds of the time a few years back when New Scientist announced in their Christmas edition that breakthroughs anticipated in the coming year included “discovery of the gene responsible for believing in genetic determinism”! But that time they were not serious!

  20. #20 c
    December 5, 2008

    To me it seems plausible that a gene variant regulating serotonin production is relevant in an anxiety syndrome. Further, the amygdala is clearly involved in anxiety, and the placebo responce reported in this study was achived under double blind conditions in the context of a real clinical trial.
    Ofcourse it has to be replicated, I am sure the authors agree with that too.

  21. #21 Neuroskeptic
    December 7, 2008

    Well, having read the paper I have to say that the sample size is not the biggest problem with it. The biggest problem is that it didn’t measure the placebo effect at all. Full critique here

  22. #22 Neuroskeptic
    December 7, 2008

    That is to say here:
    http://neuroskeptic.blogspot.com/2008/12/lessons-from-placebo-gene.html
    Why do hyperlinks sometimes not work on ScienceBlogs?

  23. #23 Daniel MacArthur
    December 10, 2008

    Somehow I missed your comment, Neuroskeptic, but I’ve linked your post now.

    As for hyperlinks – you didn’t have quotation marks around the URL in the first comment. I fixed it for you.

  24. #24 Christopher Mims
    January 8, 2009

    Thanks for the public service. This is one science journalist who has now internalized that rule of thumb.

  25. #25 Anonymous Regressor
    January 16, 2009

    A better rule of thumb than the 100 samples is the variable to sample ratio (or conversely events per variables). Spurious results and overfitting are much more likely when there are fewer than 10 samples per variable. Note that this is the total number of examined variables throughout the analysis, not just the number of variables in the final model. See the book Harrell (2001) “Regression Modeling Strategies” for a good discussion of this.

    Note that it is possible to build stable models when there are more variables than samples (through things like penalization/regularization/shrinkage methods), as is common in microarray studies, but even after applying all the statistical wizardry you must face the fact that in such scenarios you’re limited to simple models otherwise you’ll severely overfit.

  26. #26 Daniel MacArthur
    January 16, 2009

    That’s a good point, but it can be very difficult to figure out “the total number of examined variables throughout the analysis” – since it is (sadly) standard practice to only publish the models that actually generate a positive result. That means there can be any number of hidden variables that aren’t accounted for in the published work.

    As a particularly egregious example, check out the paper on a genetic association between the Duffy blood group and HIV susceptibility that I wrote about last year. The authors clearly tested multiple candidate markers, but only one was significant (probably due to confounding by population structure rather than a genuine association); they then went on to hide the non-significant markers in their publication by rebadging them as “ancestry markers”. This sort of behaviour is far more common than it should be, although it must be said that most researchers hide it better!

  27. #27 Anonymous Regressor
    January 18, 2009

    That’s a good point, but it can be very difficult to figure out “the total number of examined variables throughout the analysis” – since it is (sadly) standard practice to only publish the models that actually generate a positive result. That means there can be any number of hidden variables that aren’t accounted for in the published work.

    I completely agree, there’s such a huge selection bias towards publishing good results, there’s no incentive to do it better. Even when you want to be honest about the process, it’s difficult not to avoid shoot yourself in the foot. Things like cross-validation or bootstrapping can help a lot.

    One small step is for journals to mandate bootstrapping the results and including source code for reproducible results.

    Having said that, I think it would be impossible to stop all people who want to data dredge and to find non-existent “results” no matter what.

  28. #28 Brian
    January 26, 2009

    To spit out a particular number (100) as passing muster and suggest everyone follow it as a rule of thumb is just criminal. There’s a lot more to consider here.

  29. #29 Daniel MacArthur
    January 27, 2009

    Brian,

    See my comment to Don above. I agree that there are many facets to a genetic association study that must be taken into consideration; these would all play a role in evaluating studies with a reasonable sample size. However, the history of the field suggests that the signal-to-noise ratio among studies with fewer than 100 participants is simply too low to even bother regarding such studies as saying anything until they have achieved independent replication.

    Sure, it’s an arbitrary figure, but given the size of the genetic association literature and the large proportion of studies that are false positives you simply have to draw a line somewhere.

    Note that I am not arguing that small association studies are completely worthless – they are fine as pilot studies to identify possible associations to follow up in more detail in larger samples. But as isolated findings they should never be treated seriously, and certainly never be granted the level of media attention that this study was.