A number of very smart people (and smart communities) seem like they might be under the impression that the “voodoo correlations” scandal in the neuroimaging community is somehow related to recent work by Bennett et al, who used fMRI to show task-related neural activity in a dead fish.

These two things have almost nothing to do with one another.

1) The Bennett work is, in the words of a friend, “a cute way to make a point” that every fMRI paper I’ve ever read has failed to explicitly acknowledge. The reason they’ve failed to acknowledge it is that *it’s standard to run equivalent statistical tests to the ones that Bennett et al recommend* (of course, that doesn’t keep a “sizable minority” of studies from failing to do so – in Bennett’s estimation between 25-35%; I suppose I’m not reading those studies). Anyway, the Bennett point is simple: when you run a large number of statistical tests simultaneously, even on a random dataset, you’re bound to find some percentage of tests that turn up “significant” just as a result of chance, and with some probability those significant results will randomly cluster together in 3D space. If one fails to correct the significance threshold for the large number of statistical tests performed, then you get unreliable results, **even if you only consider those significant results that cluster in 3D space.** (it’s this latter point that makes the study interesting, worthwhile, and worthy of publication in a high profile journal, in my opinion). Regardless, the potential issue was already well known, perhaps explaining the difficulty the authors reportedly have in publishing their work. The problem they identified is why virtually everyone everywhere uses, and for a long time has used, *both* multiple comparisons correction and cluster-based correction when reporting fMRI results. As Bennett et al noted in their poster, such corrections are widely available in all the major neuroimaging analysis packages and are the default in one major package, FSL.

2) The “Voodoo correlations” work, on the other hand, is principally about the non-independence of multiple tests. Simply put, even when you do both types of the corrections discussed above in point #1, it’s not OK to take the results of that analysis (clusters in 3D space) and then run additional analyses of the *same* clusters in the *same* dataset because the data is now biased by the first analysis.

An example from the original Vul paper should make this problem clear:

We (the authors of this paper) have identified a weather station whose temperature readings predict daily changes in the value of a specific set of stocks with a correlation of r=-0.87. For $50.00, we will provide the list of stocks to any interested reader. That way, you can buy the stocks every morning when the weather station posts a drop in temperature, and sell when the temperature goes up. Obviously, your potential profits here are enormous. But you may wonder: how did we find this correlation? The figure of -.87 was arrived at by separately computing the correlation between the readings of the weather station in Adak Island, Alaska, with each of the 3315 financial instruments available for the New York Stock Exchange (through the Mathematica function FinancialData) over the 10 days that the market was open between November 18th and December 3rd, 2008. We then averaged the correlation values of the stocks whose correlation exceeded a high threshold of our choosing, thus yielding the figure of -.87. Should you pay us for this investment strategy? Probably not: Of the 3,315 stocks assessed, some were sure to be correlated with the Adak Island temperature measurements simply by chance – and if we select just those (as our selection process would do), there was no doubt we would find a high average correlation. Thus, the final measure (the average correlation of a subset of stocks) was not independent of the selection criteria (how stocks were chosen): this, in essence, is the non-independence error.

To summarize, the dead fish study is a point about first-pass analysis, which almost every paper I’ve ever seen does correctly. The papers that don’t always note that the result failed to pass multiple comparisons or cluster correction, and typicallly discuss those results with caution. On the other hand, “voodoo correlations” is a point about nonindependence in statistical tests. This has not always been done correctly, and has not always been reported clearly. Moreover it primarily affects only a subset of correlations between brain and behavior – and not the vast majority of work in fMRI, which has to do with task-brain relationships.