Even More Voodoo

Over at Mind Matters, I recently interviewed Matthew Lieberman, a social neuroscientist at UCLA. The previous week I asked Ed Vul, lead author of the "Voodoo Correlations" paper a few questions, and I wanted to make sure I gave some of the scientists he criticized a chance to rebut the accusations. (Here's some excellent background reading on the Voodoo controversy.) I think Lieberman makes some excellent points:

The argument that Vul and colleagues put forward in their paper is that correlations observed in social neuroscience papers are impossibly high. There's a metric (the product of the reliabilities of the two variables) that determines just how high of a correlation can be observed between two variables. They suggest that because, on average, this metric allows correlations as high as 0.74, that social neuroscientists should never see correlations higher than that.

Given the gravity of the claim, it's important to get this [figure] right, but they do not. Here's their mistake: it's not the average of this metric that determines what can be observed in a study, but rather the metric for that particular study or at the very least, the metric estimated from prior use of the actual measures in that study. Just because the average price of groceries in a supermarket is $3 does not mean you cannot find a $12 item. In fact, a study that I'm an author on (and is a major target in the Vul et al. paper) is a perfect example. The reliability of the self-report measure in our study is far higher than the average they report allowing for higher observed correlations. They knew this [fact], but presented our study as violating the "theoretical upper bound" anyway.

Their second major conceptual point is that numerous social neuroscience authors were making a non-independence error. Ed Vul gives a nice example of what he means by the non-independence error in a chapter with [Massachusetts Institute of Technology neuroscientist] Nancy Kanwisher. They suggest that we might be interested in whether a psychology or a sociology course is harder and assess this [question] by comparing the grades of students who took both courses. In a comparison of all students, we find no difference in scores. But what if we began by selecting only students who scored higher in psychology than sociology and then statistically compared those? If we used the results of that analysis to draw a general inference about the two courses, this [strategy] would be a non-independence error, because the selection of the sample to test is not independent of the criterion being tested. This [practice] would massively bias the results.

Although Vul is absolutely right that this would be a major error, he's not describing what we actually do. Vul's example assumes that the question that we are interested in is how the entire brain correlates with a personality measure or responds differently to two tasks. Staying with the grades examples, what social neuroscientists are really doing, however, is something closer to asking, "Across all colleges in the country, are there colleges where psychology grades are higher than sociology grades?" In other words, the question is not what the average difference is across all schools, but rather which schools show a difference. There is nothing inappropriate about asking this question or about describing the results found in those schools where a significant effect emerges.

With whole-brain analyses in fMRI, we're doing the same thing. We are interested in where significant effects are occurring in the brain and when we find them we describe the results in terms of means, correlations, and so on. We are not cherry-picking regions and then claiming these represent the effects for the whole brain.

In other words, the debate continues. But even Lieberman admits that the whole brouhaha has been good for the field, as the criticism has inspired a new level of rigor and skeptical analysis. A correlation is a tricky thing.

We also discuss some of Lieberman's fascinating work on the reward pathway and grief and why it takes self-control to accept unfair offers.

More like this

It seems to me that a lot of the problems with these studies being questioned boil down to one simple concept that does not even require a "metric" to understand - The Conjunction Rule. TCR states that the probability of a conjunction of two events cannot be higher than the probability of the single consituents. (The p of A+B occuring together cannot be higher than the p of A or the p of B alone.)

But maybe I'm missing something.

I´m fascinated with Lieberman´s work on "social rejection".

About the controversy as the saying goes: "every cloud has a silver lining", because this (healthy) debate can sharp the methods to be use in the field of study.

The first point is quite sensible.

However, the second point reinforces the findings of Vul et al rather than rebuff them as Liebermann would like.

"Across all colleges in the country, are there colleges where psychology grades are higher than sociology grades?"

That question is meaningless in a scientific sense, and not what social neuroscience wants to ask. If have a collection of 20000 colleges, of course you will find one that has a difference between the two scores, but the result may very well be do to the fact that the scores have a variance, rather than anything meaningful. There are however two related questions which do have scientific merit.

The first is, "Across all colleges, is there a school with higher psych scores vs psychology scores than would be prediced by chance when looking at so many schools?" That is full bonforoni correction which would probably render pretty much all correlations mentioned in Vul et al insignificant, even the ones who did everything right. So, while this question is valid, it is prehaps too restrictive.

The second is, "Is there a particular subset of colleges which show higher psych scores than sociology?" This is the question that Liebermann, and most social neuroscientists would like to ask. Imporantly, this question hinges on how you select that subset. If you select the subset of schools based on the exact same attribute you are testing, well it's no big surprise when your test comes out positive. That is the independence error that Vul is talking about.

How you select that subset frames the question you will be asking. You could select your subset with a related measure, say those schools which show higher numbers of students in psych than sociology. This creates a question about how school priorities effect student achievement. Or you could select the subset on something completely unrelated, such as those schools who's football team had a winning season, which leads to much more bizzare question positing some relationship between football and psych but not soc.

So, importantly, when looking at any form of science which reports findings for just a subset of neurons, voxels, cells, bacteria, or widgets, you should ask, 'how did they pick the subset?' Unfortunately, Vul et al. found that over half of social neuroscientist chose their subsets incorrectly.

Lieberman is being a little misleading when he says that

"Although Vul is absolutely right that this [non-independent analysis] would be a major error, he's not describing what we actually do. Vul's example assumes that the question that we are interested in is how the entire brain correlates with a personality measure or responds differently to two tasks. Staying with the grades examples, what social neuroscientists are really doing, however, is something closer to asking, "Across all colleges in the country, are there colleges where psychology grades are higher than sociology grades?" In other words, the question is not what the average difference is across all schools, but rather which schools show a difference."

But Vul's paper is full of examples of studies in which the magnitude of the correlations was quoted. This is not the same as finding that there is a correlation - it goes far beyond that. And as Vul says, it has often been done in a way which inflates the size of the correlation.

I fear that there has been a failure on both sides to make this very important distinction between the size of a correlation and the existence of a correlation. My post here explains the issue.

Although while this has been a failure on both sides, it seems to mainly be of benefit to Vul et. al.'s critics...

Kevin,
You correctly note that the key issue here is how the subsets are selected. There are good ways to select a subset, bad ways, and methods that fall in a gray area.
Taking your analogy, the trouble with Vul is that they asked a survey question asking whether people used any statistical measure to select their subset. If they said yes, Vul et al assumed they used the worst type of selection method possible. In reality many of the "voodoo" papers actually used perfectly reasonable selection methods, just not the methods Vul was proposing. I haven't gone through the list, but I know a couple that essentially selected brain regions based on a completely separate task and then did their correlations using separate fMRI data. They are still on Vuls blacklist because they used a functional mask.

If Vul wants to stand by is accusations, he'll need better evidence to show that most or even any of the criticized work did the subset selection method he accuses them of doing.

Neuroskeptic,
I think you have a very good point here. One of the issues though is that most fMRI studies care about the existence of a correlation. Did any of the papers say that, since we found a r=0.8 we have a test that can define something with an X level of predictive value? Most of the studies are merely reporting the r value to quantify the existence of a correlation. Vul started with the assumption that the strength, not existence of the correlations were the key findings of the paper.

Vul had one or two good points, but they were lost in shoddy analysis and overly bombastic language. While his paper will probably improve details in methodology sections, he really lost a chance to make an actually valuable contribution to the field.

bsci: On the functional mask issue, I don't know if what you say is correct, but I know for a fact that in the paper Vul et. al. explicitly say that a functional mask *could* be a perfectly good approach. So if they have "redlisted" a paper merely for functional masking, they are doing something that they themselves condemn.

On the issue of strength vs. existence - well, you have a point. But as Vul et. al. say, and not unreasonably - if you have a correlation r=0.8, that's a lot of more scientifically interesting than a correlation r=0.2. Even if no-one explicitly says so, a correlation of r=0.8 screams "further research is needed!" whereas a correlation of 0.2 is not so hot.

Don't forget that correlations don't really account for sample size. The chance of a true correlation of 0.4 being estimated as a correlation of 0.8 is much higher with a sample size of 15 than a sample size of 100. In the same perspective, an r=0.2 with a sample size of 10000 can be a major finding. fMRI tends to use low sample sizes, which give less meaning to the actual correlation values.

Neuroskeptic,
I think you have a very good point here. One of the issues though is that most fMRI studies care about the existence of a correlation. Did any of the papers say that, since we found a r=0.8 we have a test that can define something with an X level of predictive value? cambalkon Most of the studies are merely reporting seriilan the r value to quantify the existence of a correlation.ankara Vul started with the assumption that the strength, not existence of the correlations were the key findings of the paper.

Vul had cumhuriyet one or two good points, but they were lost in shoddy analysis and overly bombastic language. While his paper will probably improve details in methodology sections, he really lost a chance to make an actually valuable contribution to the field.