fMRI of a dead salmon: Why dead fish have almost nothing to do with "voodoo correlations" in neuroimaging

A number of very smart people (and smart communities) seem like they might be under the impression that the "voodoo correlations" scandal in the neuroimaging community is somehow related to recent work by Bennett et al, who used fMRI to show task-related neural activity in a dead fish.

These two things have almost nothing to do with one another.

1) The Bennett work is, in the words of a friend, "a cute way to make a point" that every fMRI paper I've ever read has failed to explicitly acknowledge. The reason they've failed to acknowledge it is that it's standard to run equivalent statistical tests to the ones that Bennett et al recommend (of course, that doesn't keep a "sizable minority" of studies from failing to do so - in Bennett's estimation between 25-35%; I suppose I'm not reading those studies). Anyway, the Bennett point is simple: when you run a large number of statistical tests simultaneously, even on a random dataset, you're bound to find some percentage of tests that turn up "significant" just as a result of chance, and with some probability those significant results will randomly cluster together in 3D space. If one fails to correct the significance threshold for the large number of statistical tests performed, then you get unreliable results, even if you only consider those significant results that cluster in 3D space. (it's this latter point that makes the study interesting, worthwhile, and worthy of publication in a high profile journal, in my opinion). Regardless, the potential issue was already well known, perhaps explaining the difficulty the authors reportedly have in publishing their work. The problem they identified is why virtually everyone everywhere uses, and for a long time has used, both multiple comparisons correction and cluster-based correction when reporting fMRI results. As Bennett et al noted in their poster, such corrections are widely available in all the major neuroimaging analysis packages and are the default in one major package, FSL.

2) The "Voodoo correlations" work, on the other hand, is principally about the non-independence of multiple tests. Simply put, even when you do both types of the corrections discussed above in point #1, it's not OK to take the results of that analysis (clusters in 3D space) and then run additional analyses of the same clusters in the same dataset because the data is now biased by the first analysis.

An example from the original Vul paper should make this problem clear:

We (the authors of this paper) have identified a weather station whose temperature readings predict daily changes in the value of a specific set of stocks with a correlation of r=-0.87. For $50.00, we will provide the list of stocks to any interested reader. That way, you can buy the stocks every morning when the weather station posts a drop in temperature, and sell when the temperature goes up. Obviously, your potential profits here are enormous. But you may wonder: how did we find this correlation? The figure of -.87 was arrived at by separately computing the correlation between the readings of the weather station in Adak Island, Alaska, with each of the 3315 financial instruments available for the New York Stock Exchange (through the Mathematica function FinancialData) over the 10 days that the market was open between November 18th and December 3rd, 2008. We then averaged the correlation values of the stocks whose correlation exceeded a high threshold of our choosing, thus yielding the figure of -.87. Should you pay us for this investment strategy? Probably not: Of the 3,315 stocks assessed, some were sure to be correlated with the Adak Island temperature measurements simply by chance - and if we select just those (as our selection process would do), there was no doubt we would find a high average correlation. Thus, the final measure (the average correlation of a subset of stocks) was not independent of the selection criteria (how stocks were chosen): this, in essence, is the non-independence error.

To summarize, the dead fish study is a point about first-pass analysis, which almost every paper I've ever seen does correctly. The papers that don't always note that the result failed to pass multiple comparisons or cluster correction, and typicallly discuss those results with caution. On the other hand, "voodoo correlations" is a point about nonindependence in statistical tests. This has not always been done correctly, and has not always been reported clearly. Moreover it primarily affects only a subset of correlations between brain and behavior - and not the vast majority of work in fMRI, which has to do with task-brain relationships.

Categories

More like this

Over at Mind Matters, I recently interviewed Matthew Lieberman, a social neuroscientist at UCLA. The previous week I asked Ed Vul, lead author of the "Voodoo Correlations" paper a few questions, and I wanted to make sure I gave some of the scientists he criticized a chance to rebut the accusations…
As a graduate student, I observed the nascent field of functional magnetic resonance imaging and thought to myself with some amusement "modern phrenology! Now with big, fancy, expensive equipment!" Count me among those who have never been terribly impressed with fMRI, and certainly not with its…
Tim Curtin's incompetence with basic statistics is the stuff of legend. Curtin has now demonstrated incompetence at a fairly new journal called The Scientific World Journal. Consider his very first "result" (emphasis mine): I first regress the global mean temperature (GMT) anomalies against the…
I've got an interview with Ed Vul, the lead author of the recent paper on "Voodoo Correlations in Social Neuroscience," over at Scientific American. Since the paper hit the web, it has provoked a flurry of rebuttals and responses. If you'd like a balanced perspective on the issue - and it's worth…

Hey Chris. I'd go even farther on the salmon work: it's hilarious. How else could you get the science-loving public interested in fMRI statistical methods? Vul's work is obviously different and clearly more controversial. I initially wanted to get deeper into that in my article, but I decided to save it for a future story so that people didn't tie the salmon stuff in too directly with the voodoo stuff.

Chris - Great writeup of the differences between the multiple comparisons problem and the non-independence error. There has been a lot of confusion between the two over the last week and it is important to understand that each is a separate statistical problem in fMRI.

You mention several times in your post that almost every paper you have read does multiple comparisons correction correctly. This is great, but has not always been the case. When I was completing my training only a handful of papers properly corrected for multiple comparisons. The number are far better today, as 75% or more of published papers in good journals are corrected. Still, just because the majority of papers are doing it does not mean it is 'standard' yet. It is still quite possible to get a result published with a p-value cutoff of 0.001 and an 8 voxel extent threshold - doubly so if you are flexible with regard to what journal you send it to. The main argument of the salmon poster/paper is that everyone should be using multiple comparisons correction as part of their research.

Thanks to both of you for your comments. I agree with everything you've said. I finally got pushed over the edge to write this post when I started getting emails from trained experimental psychologists saying that Craig's work indicates the "voodoo correlations" problem goes deeper than we'd thought. We're going to be catching flak for that voodoo stuff for years, and I just didn't want this to get lumped in with it :)

You're quite right that these are two entirely separate issues. Although I disagree that almost all papers correctly use multiple comparisons correction - a small but significant minority of papers that I read don't. Maybe it depends on the field of neuroscience in question.

But the Voodoo Correlations issue and the multiple comparisons issue are united by one thing. They are both about statistical mistakes which are easy to understand, but still, a sizable number of fMRI papers have been falling foul of them.

This is not because fMRI researchers are stupid, but simply because we don't tend to think about statistical issues enough. The early fMRI pioneers were well versed in the physics and mathematics of what they were doing, but the huge number of neuroscientists who started using fMRI in the past few years tend to be much less knowledgeable about that side of it.

Wow, I read both these papers without really appreciating the difference. The Vul work takes a bit of chewing (for a non-imager, anyway). Thanks for the summary!