Maggie Fox writes:
Brain scans may be able to predict what you will do better than you can yourself . . . They found a way to interpret “real time” brain images to show whether people who viewed messages about using sunscreen would actually use sunscreen during the following week.
The scans were more accurate than the volunteers were, Emily Falk and colleagues at the University of California Los Angeles reported in the Journal of Neuroscience. . . .
About half the volunteers had correctly predicted whether they would use sunscreen. The research team analyzed and re-analyzed the MRI scans to see if they could find any brain activity that would do better.
Activity in one area of the brain, a particular part of the medial prefrontal cortex, provided the best information.
“From this region of the brain, we can predict for about three-quarters of the people whether they will increase their use of sunscreen beyond what they say they will do,” Lieberman said.
“It is the one region of the prefrontal cortex that we know is disproportionately larger in humans than in other primates,” he added. “This region is associated with self-awareness, and seems to be critical for thinking about yourself and thinking about your preferences and values.”
Hmm . . . they “analyzed and re-analyzed the scans to see if they could find any brain activity” that would predict better than 50%?! This doesn’t sound so promising. But maybe the reporter messed up on the details . . .
I took advantage of my library subscription to take a look at the article, “Predicting Persuasion-Induced Behavior Change from the Brain,” by Emily Falk,Elliot Berkman,Traci Mann, Brittany Harrison, and Matthew Lieberman. Here’s what they say:
- “Regions of interest were constructed based on coordinates reported by Soon et al. (2008) in MPFC and precuneus, regions that also appeared in a study of persuasive messaging.” OK, so they picked two regions of interest ahead of time. They didn’t just search for “any brain activity.” I’ll take their word for it that they just looked at these two, that they didn’t actually look at 50 regions and then say they reported just two.
- Their main result had a t-statistic of 2.3 (on 18 degrees of freedom, thus statistically significant at the 3% level) in one of the two regions they looked at, and a t-statistic of 1.5 (not statistically significant) in the other. A simple multiple-comparisons correction takes the p-value of 0.03 and bounces it up to an over-the-threshold 0.06, which I think would make the result unpublishable! On the other hand, a simple average gives a healthy t-statistic of (1.5+2.3)/sqrt(2) = 2.7, although that ignores any possible correlation between the two regions (they don’t seem to supply that information in their article).
- They also do a cross-validation but this seems 100% pointless to me since they do the cross-validation on the region that already “won” on the full data analysis. For the cross-validation to mean anything at all, they’d have to use the separate winner on each of the cross-validatory fits.
- As an outcome, they use before-after change. They should really control for the “before” measurement as a regression predictor. That’s a freebie. And, when you’re operating at a 6% significance level, you should take any freebie that you can get! (It’s possible that they tried adjusting for the “before” measurement and it didn’t work, but I assume they didn’t do that, since I didn’t see any report of such an analysis in the article.)
The bottom line
I’m not saying that the reported findings are wrong, I’m just saying that they’re not necessarily statistically significant in the usual way this term is used. I think that, in the future, such work would be improved by more strongly linking the statistical analysis to the psychological theories. Rather than simply picking two regions to look at, then taking the winner in a study of n=20 people, and going from there to the theories, perhaps they could more directly model what they’re expecting to see.
The difference between . . .
Also, the difference between “significant” and “not significant” is not itself statistically significant. How is this relevant in the present study? They looked at two regions, MPFC and precuneus. Both showed positive correlations, one with a t-value of 2.3, one with a t-value of 1.5. The first of these is statistically significant (well, it is, if you ignore that it’s the maximum of two values), the second is not. But the difference is not anything close to statistically significant, not at all! So why such a heavy emphasis on the winner and such a neglect of #2?
Here’s the count from a simple document search:
MPFC: 20 instances (including 2 in the abstract)
precuneus: 8 instances (0 in the abstract)
P.S. The “picked just two regions” bit gives a sense of why I prefer Bayesian inference to classical hypothesis testing. The right thing, I think, is actually to look at all 50 regions (or 100, or however many regions there are) and do an analysis including all of them. Not simply picking the region that is most strongly correlated with the outcome and then doing a correction–that’s not the most statistically efficient thing to do, you’re just asking, begging to be overwhelmed by noise)–but rather using the prior information about regions in a subtler way than simply picking out 2 and ignoring the other 48. For example, you could have a region-level predictor which represents prior belief in the region’s importance. Or you could group the regions into a few pre-chosen categories and then estimate a hierarchical model with each group of regions being its own batch with group-level mean and standard deviation estimated from data. The point is, you have information you want to use–prior knowledge from the literature–without it unduly restricting the possibilities for discovery in your data analysis.
Near the end, they write:
In addition, we observed increased activity in regions involved in memory encoding, attention, visual imagery, motor execution and imitation, and affective experience with increased behavior change.
These were not pre-chosen regions, which is fine, but at this point I’d like to see the histogram of correlations for all the regions, along with a hierarchical model that allows appropriate shrinkage. Or even a simple comparison to the distribution of correlations one might expect to see by chance. By suggesting this, I’m not trying to imply that all the findings in this paper are due to chance; rather, I’m trying to use statistical methods to subtract out the chance variation as much as possible.
P.P.S. Just to say this one more time: I’m not at all trying to claim that the researchers are wrong. Even if they haven’t proven anything in a convincing way, I’ll take their word for it that their hypothesis makes scientific sense. And, as they point out, their data are definitely consistent with their hypotheses.