A number of very smart people (and smart communities) seem like they might be under the impression that the "voodoo correlations" scandal in the neuroimaging community is somehow related to recent work by Bennett et al, who used fMRI to show task-related neural activity in a dead fish.
These two things have almost nothing to do with one another.
1) The Bennett work is, in the words of a friend, "a cute way to make a point" that every fMRI paper I've ever read has failed to explicitly acknowledge. The reason they've failed to acknowledge it is that it's standard to run equivalent statistical tests to the ones that Bennett et al recommend (of course, that doesn't keep a "sizable minority" of studies from failing to do so - in Bennett's estimation between 25-35%; I suppose I'm not reading those studies). Anyway, the Bennett point is simple: when you run a large number of statistical tests simultaneously, even on a random dataset, you're bound to find some percentage of tests that turn up "significant" just as a result of chance, and with some probability those significant results will randomly cluster together in 3D space. If one fails to correct the significance threshold for the large number of statistical tests performed, then you get unreliable results, even if you only consider those significant results that cluster in 3D space. (it's this latter point that makes the study interesting, worthwhile, and worthy of publication in a high profile journal, in my opinion). Regardless, the potential issue was already well known, perhaps explaining the difficulty the authors reportedly have in publishing their work. The problem they identified is why virtually everyone everywhere uses, and for a long time has used, both multiple comparisons correction and cluster-based correction when reporting fMRI results. As Bennett et al noted in their poster, such corrections are widely available in all the major neuroimaging analysis packages and are the default in one major package, FSL.
2) The "Voodoo correlations" work, on the other hand, is principally about the non-independence of multiple tests. Simply put, even when you do both types of the corrections discussed above in point #1, it's not OK to take the results of that analysis (clusters in 3D space) and then run additional analyses of the same clusters in the same dataset because the data is now biased by the first analysis.
An example from the original Vul paper should make this problem clear:
We (the authors of this paper) have identified a weather station whose temperature readings predict daily changes in the value of a specific set of stocks with a correlation of r=-0.87. For $50.00, we will provide the list of stocks to any interested reader. That way, you can buy the stocks every morning when the weather station posts a drop in temperature, and sell when the temperature goes up. Obviously, your potential profits here are enormous. But you may wonder: how did we find this correlation? The figure of -.87 was arrived at by separately computing the correlation between the readings of the weather station in Adak Island, Alaska, with each of the 3315 financial instruments available for the New York Stock Exchange (through the Mathematica function FinancialData) over the 10 days that the market was open between November 18th and December 3rd, 2008. We then averaged the correlation values of the stocks whose correlation exceeded a high threshold of our choosing, thus yielding the figure of -.87. Should you pay us for this investment strategy? Probably not: Of the 3,315 stocks assessed, some were sure to be correlated with the Adak Island temperature measurements simply by chance - and if we select just those (as our selection process would do), there was no doubt we would find a high average correlation. Thus, the final measure (the average correlation of a subset of stocks) was not independent of the selection criteria (how stocks were chosen): this, in essence, is the non-independence error.
To summarize, the dead fish study is a point about first-pass analysis, which almost every paper I've ever seen does correctly. The papers that don't always note that the result failed to pass multiple comparisons or cluster correction, and typicallly discuss those results with caution. On the other hand, "voodoo correlations" is a point about nonindependence in statistical tests. This has not always been done correctly, and has not always been reported clearly. Moreover it primarily affects only a subset of correlations between brain and behavior - and not the vast majority of work in fMRI, which has to do with task-brain relationships.
- Log in to post comments
Hey Chris. I'd go even farther on the salmon work: it's hilarious. How else could you get the science-loving public interested in fMRI statistical methods? Vul's work is obviously different and clearly more controversial. I initially wanted to get deeper into that in my article, but I decided to save it for a future story so that people didn't tie the salmon stuff in too directly with the voodoo stuff.
Chris - Great writeup of the differences between the multiple comparisons problem and the non-independence error. There has been a lot of confusion between the two over the last week and it is important to understand that each is a separate statistical problem in fMRI.
You mention several times in your post that almost every paper you have read does multiple comparisons correction correctly. This is great, but has not always been the case. When I was completing my training only a handful of papers properly corrected for multiple comparisons. The number are far better today, as 75% or more of published papers in good journals are corrected. Still, just because the majority of papers are doing it does not mean it is 'standard' yet. It is still quite possible to get a result published with a p-value cutoff of 0.001 and an 8 voxel extent threshold - doubly so if you are flexible with regard to what journal you send it to. The main argument of the salmon poster/paper is that everyone should be using multiple comparisons correction as part of their research.
Thanks to both of you for your comments. I agree with everything you've said. I finally got pushed over the edge to write this post when I started getting emails from trained experimental psychologists saying that Craig's work indicates the "voodoo correlations" problem goes deeper than we'd thought. We're going to be catching flak for that voodoo stuff for years, and I just didn't want this to get lumped in with it :)
You're quite right that these are two entirely separate issues. Although I disagree that almost all papers correctly use multiple comparisons correction - a small but significant minority of papers that I read don't. Maybe it depends on the field of neuroscience in question.
But the Voodoo Correlations issue and the multiple comparisons issue are united by one thing. They are both about statistical mistakes which are easy to understand, but still, a sizable number of fMRI papers have been falling foul of them.
This is not because fMRI researchers are stupid, but simply because we don't tend to think about statistical issues enough. The early fMRI pioneers were well versed in the physics and mathematics of what they were doing, but the huge number of neuroscientists who started using fMRI in the past few years tend to be much less knowledgeable about that side of it.
The Vul excerpt sounds akin to Taleb's "Fooled by Randomness", an excellent read for anyone who prefers scientific thinking over anecdotal--that is, anyone who prefers listening to non-random signals over random statistical noise.
http://www.amazon.com/Fooled-Randomness-Hidden-Chance-Markets/dp/140006…
Wow, I read both these papers without really appreciating the difference. The Vul work takes a bit of chewing (for a non-imager, anyway). Thanks for the summary!