The zombie menace has so far been studied only qualitatively or through the use of mathematical models without empirical content. We propose to use a new tool in survey research to allow zombies to be studied indirectly without risk to the interviewers.

It’s on Arxiv, so you know it’s real.

]]>Brain scans may be able to predict what you will do better than you can yourself . . . They found a way to interpret “real time” brain images to show whether people who viewed messages about using sunscreen would actually use sunscreen during the following week.

The scans were more accurate than the volunteers were, Emily Falk and colleagues at the University of California Los Angeles reported in the Journal of Neuroscience. . . .

About half the volunteers had correctly predicted whether they would use sunscreen. The research team analyzed and re-analyzed the MRI scans to see if they could find any brain activity that would do better.

Activity in one area of the brain, a particular part of the medial prefrontal cortex, provided the best information.

“From this region of the brain, we can predict for about three-quarters of the people whether they will increase their use of sunscreen beyond what they say they will do,” Lieberman said.

“It is the one region of the prefrontal cortex that we know is disproportionately larger in humans than in other primates,” he added. “This region is associated with self-awareness, and seems to be critical for thinking about yourself and thinking about your preferences and values.”

Hmm . . . they “analyzed and re-analyzed the scans to see if they could find *any* brain activity” that would predict better than 50%?! This doesn’t sound so promising. But maybe the reporter messed up on the details . . .

I took advantage of my library subscription to take a look at the article, “Predicting Persuasion-Induced Behavior Change from the Brain,” by Emily Falk,Elliot Berkman,Traci Mann, Brittany Harrison, and Matthew Lieberman. Here’s what they say:

- “Regions of interest were constructed based on coordinates reported by Soon et al. (2008) in MPFC and precuneus, regions that also appeared in a study of persuasive messaging.” OK, so they picked two regions of interest ahead of time. They didn’t just search for “any brain activity.” I’ll take their word for it that they just looked at these two, that they didn’t actually look at 50 regions and then say they reported just two.

- Their main result had a t-statistic of 2.3 (on 18 degrees of freedom, thus statistically significant at the 3% level) in one of the two regions they looked at, and a t-statistic of 1.5 (not statistically significant) in the other. A simple multiple-comparisons correction takes the p-value of 0.03 and bounces it up to an over-the-threshold 0.06, which I think would make the result unpublishable! On the other hand, a simple average gives a healthy t-statistic of (1.5+2.3)/sqrt(2) = 2.7, although that ignores any possible correlation between the two regions (they don’t seem to supply that information in their article).

- They also do a cross-validation but this seems 100% pointless to me since they do the cross-validation on the region that already “won” on the full data analysis. For the cross-validation to mean anything at all, they’d have to use the separate winner on each of the cross-validatory fits.

- As an outcome, they use before-after change. They should really control for the “before” measurement as a regression predictor. That’s a freebie. And, when you’re operating at a 6% significance level, you should take any freebie that you can get! (It’s possible that they tried adjusting for the “before” measurement and it didn’t work, but I assume they didn’t do that, since I didn’t see any report of such an analysis in the article.)

**The bottom line**

I’m not saying that the reported findings are wrong, I’m just saying that they’re not necessarily statistically significant in the usual way this term is used. I think that, in the future, such work would be improved by more strongly linking the statistical analysis to the psychological theories. Rather than simply picking two regions to look at, then taking the winner in a study of n=20 people, and going from there to the theories, perhaps they could more directly model what they’re expecting to see.

**The difference between . . . **

Also, the difference between “significant” and “not significant” is not itself statistically significant. How is this relevant in the present study? They looked at two regions, MPFC and precuneus. Both showed positive correlations, one with a t-value of 2.3, one with a t-value of 1.5. The first of these is statistically significant (well, it is, if you ignore that it’s the maximum of two values), the second is not. But the difference is not anything close to statistically significant, not at all! So why such a heavy emphasis on the winner and such a neglect of #2?

Here’s the count from a simple document search:

MPFC: 20 instances (including 2 in the abstract)

precuneus: 8 instances (0 in the abstract)

P.S. The “picked just two regions” bit gives a sense of why I prefer Bayesian inference to classical hypothesis testing. The right thing, I think, is actually to look at all 50 regions (or 100, or however many regions there are) and do an analysis including all of them. Not simply picking the region that is most strongly correlated with the outcome and then doing a correction–that’s not the most statistically efficient thing to do, you’re just asking, begging to be overwhelmed by noise)–but rather using the prior information about regions in a subtler way than simply picking out 2 and ignoring the other 48. For example, you could have a region-level predictor which represents prior belief in the region’s importance. Or you could group the regions into a few pre-chosen categories and then estimate a hierarchical model with each group of regions being its own batch with group-level mean and standard deviation estimated from data. The point is, you have information you want to use–prior knowledge from the literature–without it unduly restricting the possibilities for discovery in your data analysis.

Near the end, they write:

In addition, we observed increased activity in regions involved in memory encoding, attention, visual imagery, motor execution and imitation, and affective experience with increased behavior change.

These were not pre-chosen regions, which is fine, but at this point I’d like to see the histogram of correlations for *all* the regions, along with a hierarchical model that allows appropriate shrinkage. Or even a simple comparison to the distribution of correlations one might expect to see by chance. By suggesting this, I’m *not* trying to imply that all the findings in this paper are due to chance; rather, I’m trying to use statistical methods to subtract out the chance variation as much as possible.

P.P.S. Just to say this one more time: I’m not at all trying to claim that the researchers are wrong. Even if they haven’t proven anything in a convincing way, I’ll take their word for it that their hypothesis makes scientific sense. And, as they point out, their data are definitely consistent with their hypotheses.

P.P.P.S. For those who haven’t been following these issues, see here, here, here, and here.

]]>It’s interesting stuff, and he gets into some statistical applications at the end, so I’ll give you my take on it.

But first, some background.

About two hundred years ago, the mathematician/physicist Laplace discovered what is now called the central limit theorem, which is that, under certain conditions, the average of a large number of small random variables has an approximate normal (bell-shaped) distribution. A bit over 100 years ago, social scientists such as Galton applied this theorem to all sorts of biological and social phenomena. The central limit theorem, in its generality, is also important in the information that it indirectly conveys when it fails.

For example, the distribution of the heights of adult men or women is nicely bell-shaped, but the distribution of the heights of all adults has a different, more spread-out distribution. This is because your height is the sum of many small factors and one large factor–your sex. The conditions of the theorem are that no single factor (or small number of factors) should be important on its own. For another example, it has long been observed that incomes do not follow a bell-shaped curve, even on the logarithmic scale. Nor do sizes of cities and many other social phenomena. These “power-law curves,” which don’t fit the central limit theorem, have motivated social scientists such as Herbert Simon to come up with processes more complicated than simple averaging (for example, models in which the rich get richer).

The central limit theorem is an example of an attractor–a mathematical model that appears as a limit as sample size gets large. The key feature of an attractor is that it destroys information. Think of it as being like a funnel: all sorts of things can come in, but a single thing–the bell-shaped curve–comes out. (Or, for other models, such as that used to describe the distribution of incomes, the attractor might be a power-law distribution.) The beauty of an attractor is that, if you believe the model, it can be used to explain an observed pattern without needing to know the details of its components. Thus, for example, we can see that the heights of men or of women have bell-shaped distributions, without knowing the details of the many small genetic and environmental influences on height.

Now to random matrices. A random matrix is an array of numbers, where each number is drawn from some specified probability distribution. You can compute the eigenvalues of a square matrix–that’s a set of numbers summarizing the structure of the matrix–and they will have a probability distribution that is induced by the probability distribution of the individual elements of the matrix. Over the past few decades, mathematicians such as Alan Edelman have performed computer simulations and proved theorems deriving the distribution of the eigenvalues of a random matrix, as the dimension of the matrix becomes large.

It appears that the eigenvalue distribution is an attractor. That is, for a broad range of different input models (distributions of the random matrices), you get the same output–the same eigenvalue distribution–as the sample size becomes large. This is interesting, and it’s hard to prove. (At least, it seemed hard to prove the last time I looked at it, about 20 years ago, and I’m sure that it’s even harder to make advances in the field today!)

Now, to return to the news article. If the eigenvalue distribution is an attractor, this means that a lot of physical and social phenomena which can be modeled by eigenvalues (including, apparently, quantum energy levels and some properties of statistical tests) might have a common structure. Just as, at a similar level, we see the normal distribution and related functions in all sorts of unusual places.

Consider this quote from Buchanan’s article:

Recently, for example, physicist Ferdinand Kuemmeth and colleagues at Harvard University used it to predict the energy levels of electrons in the gold nanoparticles they had constructed. Traditional theories suggest that such energy levels should be influenced by a bewildering range of factors, including the precise shape and size of the nanoparticle and the relative position of the atoms, which is considered to be more or less random. Nevertheless, Kuemmeth’s team found that random matrix theory described the measured levels very accurately.

That’s what an attractor is all about: different inputs, same output.

Thus, I don’t quite understand this quote:

Random matrix theory has got mathematicians like Percy Deift of New York University imagining that there might be more general patterns there too. “This kind of thinking isn’t common in mathematics,” he notes. ‘Mathematicians tend to think that each of their problems has its own special, distinguishing features. But in recent years we have begun to see that problems from diverse areas, often with no discernible connections, all behave in a very similar way.

This doesn’t seem like such a surprise to me–it seems very much in the tradition of mathematical modeling. But maybe there’s something I’m missing here.

Finally, Buchanan turns to social science:

An economist may sift through hundreds of data sets looking for something to explain changes in inflation – perhaps oil futures, interest rates or industrial inventories. Businesses such as Amazon.com rely on similar techniques to spot patterns in buyer behaviour and help direct their advertising.

While random matrix theory suggests that this is a promising approach, it also points to hidden dangers. As more and more complex data is collected, the number of variables being studied grows, and the number of apparent correlations between them grows even faster. With enough variables to test, it becomes almost certain that you will detect correlations that look significant, even if they aren’t. . . . even if these variables are all fluctuating randomly, the largest observed correlation will be large enough to seem significant.

This is well known. The new idea is that mathematical theory might enable the distribution of these correlations to be understood for a general range of cases. That’s interesting but doesn’t alter the basic statistical ideas.

Beyond this, I think there’s a flaw in the idea that statistics (or econometrics) proceeds by blindly looking at the correlations among all variables. In my experience, it makes more sense to fit a hierarchical model, using structure in the economic indexes rather than just throwing them all in as predictors. We are in fact studying the properties of hierarchical models when the number of cases and variables becomes large, and it’s a hard problem. Maybe the ideas from random matrix theory will be relevant here too.

Buchanan writes:

In recent years, some economists have begun to express doubts over predictions made from huge volumes of data, but they are in the minority. Most embrace the idea that more measurements mean better predictive abilities. That might be an illusion, and random matrix theory could be the tool to separate what is real and what is not.

I’m with most economists here: I think that, on average, more measurements do mean better predictive abilities! Maybe not if you are only allowed to look at correlations and least-squares regressions, but if you can model with more structure than, yes, more information should be better.

]]>But if it’s true, where is he leaving to? . . . “Wall Street consulting” is probably a polite way of saying “a return to DE Shaw”, which happily paid Larry $5 million for one year of one-day-a-week work . . . The Summers exit could well be the most lucrative use of the revolving door yet seen in the short history of the Obama administration: if he was willing to work full time, Summers could command significantly more than the $10 million a year Citigroup paid Bob Rubin when Rubin left Treasury.

As a result, Obama and his chief of staff are going to have to be very careful about exactly how they manage any Summers exit. . . . it’s going to be the easiest thing in the world for the Republicans to paint the Obama administration as the party of Wall Street fat cats.

Democrats as the party of Wall Street . . . this reminds my of the story of the political contributions of Richard Fuld, the disgraced former head of Lehman Brothers. . . .

]]>Surveying the political landscape, I [Barttlett] didn’t think the Republican candidate, whoever it might be, was very likely to win against whoever the Democratic candidate might be. Therefore I concluded that it was in the interest of conservatives to support the more conservative Democratic candidate . . . Hillary Clinton . . . probably would be governing significantly more conservatively than Obama.

I’m surprised to hear this, because I thought the consensus view among conservatives–general voters and elites alike–was that Hillary Clinton was extremely liberal, that Bill was the moderate one with Hillary pulling him to the left. Bartlett argues that Hillary Clinton was more conservative than Barack Obama based on their voting records in the Senate, but I thought the consensus among conservatives was that Hillary Clinton’s core beliefs were far left, and that her Senate votes were just a matter of positioning in preparation for her presidential push.

I also had the impression that support for Hillary Clinton from conservative Republicans in 2008 was coming from three factors:

**1. A feeling that Hillary Clinton was a polarizing and unpopular figure and thus would be a weaker candidate for the Democrats in the general election.**

2. A desire to fan the flames of the contentious Democratic primary. Since Clinton was behind in the race, supporting her would keep the battle going.

**3. Once it was clear that Barack Obama was probably going to become the Democratic nominee (and, ultimately, president), it is natural for Obama opponents to put him down by saying nice things about his opposition (Hillary Clinton, in this case).**

Bartlett writes:

I [Bartlett] also noticed that some conservatives were saying nice things about Hillary–people like National Review editor Rich Lowry, Weekly Standard editor Bill Kristol, New York Times columnist David Brooks and even right-wing columnist Ann Coulter.

I don’t buy it. I’m pretty sure that, had Hillary Clinton won the nomination and Obama lost, we would’ve heard a lot from conservatives about how the Democrats chose the old-fashioned New Deal-style liberal instead of the modern consensus-builder. I say this not as a slam on Republicans or conservatives–it’s just natural partisanship.

Coulter, for example, said on 1 Feb 2008 that she would campaign for Hillary Clinton if she were running against McCain. I don’t believe this for one minute. This seems much more like positioning to me, consistent with items 1, 2, and 3 above. Similarly, Bill Kristol on 28 April 2008 wrote, “we also see the liberal media failing to give Hillary Clinton the respect she deserves,” and goes on about her ability as a candidate. This doesn’t sound like an endorsement of her conservatism; it sounds much more like a desire for the primary election battle to continue.

Another data point from Bartlett:

A Fox News/Opinion Dynamics poll last October asked people if Hillary had won the election would she be doing a better or worse job than Obama. Among Republicans, 34% said [Hillary would be better], while only 22% of Democrats did.

In this poll, 34% of Republicans thought Hillary would be doing a better job than Barack, 29% thought worse, 20% thought same, and 23% said they didn’t know. (The last is what my response would be, incidentally.) I think these survey results reflect anti-Obama feeling more than anything else, though. Again, I’d bet that if Hillary Clinton were president, similar numbers of Republicans would be saying that Obama would be doing a better job.

I’m not questioning Bartlett’s sincerity here, just suggesting that his view of Hillary Clinton as being (relatively) conservative might not be so much of a consensus as he might think.

]]>Using nationally-representative data from the [1994] General Social Survey, we [Conley and Rauscher] find that female offspring induce

more conservative political identification. We hypothesize that this results from the change in reproductive fitness strategy that daughters may evince.

But economists Andrew Oswald and Nattavudh Powdthavee have found the **exact opposite**:

We [Oswald and Powdthavee] document evidence that having daughters leads people to be

more sympathetic to left-wing parties. Giving birth to sons, by contrast, seems to make people more likely to vote for a right-wing party. Our data, which are primarily from Great Britain, are longitudinal. We also report corroborative results for a German panel.

How to resolve these? See here.

]]>Then, the other day, someone pointed me to this analysis by Reid Wilson of a survey of TV sports watchers. (Click the image below to see it in full size.)

The graph is very well done. In particular, the red and blue coloring (indicating points to the left or right of the zero line) and the light/dark (indicating points above or below the center line on the vertical axis) are good ideas, I think, despite that they convey no additional information, in that they draw attention to key aspects of the data.

Wilson describes the data as “survey results from a total of 218K interviews between Aug. ’08 and Sept. ’09.” So I guess the standard errors are basically zero!

P.S. Wilson does this weird thing where, except in the title of his article and the label of his graph, he never uses the terms Republican and Democrat, instead writing “GOPer” and “Dem.” What’s with that? Has he just been writing political news for so long that he’s tired of typing out the full words? I don’t actually recall ever having seen the term GOPer used before, and it was a bit jarring to me. Otherwise I thought the article was fine.

]]>The randomized controlled clinical trial is the gold standard scientific method for the evaluation of diagnostic and treatment interventions. Such trials are cited frequently as the authoritative foundation for evidence-based management policies. Nevertheless, they have a number of limitations that challenge the interpretation of the results. The strength of evidence is often judged by conventional tests that rely heavily on statistical significance. Less attention has been paid to the clinical significance or the practical importance of the treatment effects. One should be cautious that extremely large studies might be more likely to find a formally statistically significant difference for a trivial effect that is not really meaningfully different from the null. Trials often employ composite end points that, although they enable assessment of nonfatal events and improve trial efficiency and statistical precision, entail a number of shortcomings that can potentially undermine the scientific validity of the conclusions drawn from these trials. Finally, clinical trials often employ extensive subgroup analysis. However, lack of attention to proper methods can lead to chance findings that might misinform research and result in suboptimal practice. Accordingly, this review highlights these limitations using numerous examples of published clinical trials and describes ways to overcome these limitations, thereby improving the interpretability of research findings.

This reasonable article reminds me of a number of things that come up repeatedly on my (other) blog and in my work, including the distinction between statistical and practical significance, the importance of interactions, and how much I hate acronyms.

They also recommend composite end points (see page 418 of the above-linked article), which is a point that Jennifer and I emphasize in chapter 4 of our book and which comes up *all the time, over and over* in my applied research and consulting. If I had to come up with one statistical tip that would be most useful to you–that is, good advice that’s easy to apply and which you might not already know–it would be to use transformations. Log, square-root, etc.–yes, all that, but more! I’m talking about transforming a continuous variable into several discrete variables (to model nonlinear patterns such as voting by age) and combining several discrete variables to make something continuous (those “total scores” that we all love). And *not* doing dumb transformations such as the use of a threshold to break up a perfectly useful continuous variable into something binary. I don’t care if the threshold is “clinically relevant” or whatever–just don’t do it. If you gotta discretize, for Christ’s sake break the variable into 3 categories.

This all seems quite obvious but people don’t know about it. What gives? I have a theory, which goes like this. People are trained to run regressions “out of the box,” not touching their data at all. Why? For two reasons:

1. Touching your data before analysis seems like cheating. If you do your analysis blind (perhaps not even changing your variable names or converting them from ALL CAPS), then you can’t cheat.

2. In classical (non-Bayesian) statistics, linear transformations on the predictors have no effect on inferences for linear regression or generalized linear models. When you’re learning applied statistics from a classical perspective, transformations tend to get downplayed, and they are considered as little more than tricks to approximate a normal error term (and the error term, as we discuss in our book, is generally the least important part of a model).

Once you take a Bayesian approach, however, and think of your coefficients as not being mathematical abstractions but actually having some meaning, you move naturally into model building and transformations.

P.S. On page 426, Kaul and Diamond recommend that, in subgroup analysis, researchers “perform adjustments for multiple comparisons.” I’m ok with that, as long as they include multilevel modeling as such an adjustment. (See here for our discussion of that point.)

P.P.S. Also don’t forget economist James Heckman’s argument, from a completely different direction, as to why randomized experiments should not be considered gold standard. I don’t know if I agree with Heckman’s sentiments (my full thoughts are here), but they’re definitely worth thinking about.

]]>