Too clever by, hmmm, about 5% a year

By agelman on January 10, 2010.

Coblogger John Sides quotes a probability calculation by Eric Lawrence that, while reasonable on a mathematical level, illustrates a sort of road-to-error-is-paved-with-good-intentions sort of attitude that bothers me, and that I see a lot of in statistics and quantitative social science.

I'll repeat Lawrence's note and then explain what bothers me.

Here's Lawrence:

In today's Wall Street Journal, Nate Silver of 538.com makes the case that most people are "horrible assessors of risk." . . . This trickiness can even trip up skilled applied statisticians like Nate Silver. This passage from his piece caught my [Lawrence's] eye:

"The renowned Harvard scholar Graham Allison has posited that there is greater than a 50% likelihood of a nuclear terrorist attack in the next decade, which he says could kill upward of 500,000 people. If we accept Mr. Allison's estimates--a 5% chance per year of a 500,000-fatality event in a Western country (25,000 causalities per year)--the risk from such incidents is some 150 times greater than that from conventional terrorist attacks."

Lawrence continues:

Here Silver makes the same mistake that helped to lay the groundwork for modern probability theory. The idea that a 5% chance a year implies as 50% chance over 10 years suggests that in 20 years, we are certain that there will be a nuclear attack. But . . . the problem is analogous to the problem that confounded Chevalier de MÃ©rÃ©, who consulted his friends Pascal and Fermat, who then derived several laws of probability. . . . A way to see that this logic is wrong is to consider a simple die roll. The probability of rolling a 6 is 1/6. Given that probability, however, it does not follow that the probability of rolling a 6 in 6 rolls is 1. To follow the laws of probability, you need to factor in the probability of rolling 2 6s, 3 6s, etc.

So how can we solve Silver's problem? The simplest way turns the problem around and solves for the probability of not having a nuclear attack. Then, preserving the structure of yearly probabilities and the decade range, the problem becomes P(no nuclear attack in ten years) = .5 = some probability p raised to the 10th power. After we muck about with logarithms and such, we find that our p, which denotes the probability of an attack not occurring each year, is .933, which in turn implies that the annual probability of an attack is .067.

But does that make a difference? The difference in probability is less than .02. On the other hand, our revised annual risk is a third larger. . . .

OK, so Lawrence definitely means well; he's gone to the trouble to write this explanatory note and even put in some discussion of the history of probability theory. And this isn't a bad teaching example. But I don't like it here. The trouble is that there's no reason at all to think of the possibility of a nuclear terrorist attack as independent in each year. One could, of course, go the next step and try a correlated probability model--and, if the correlations are positive, this would actually increase the probability in any given year--but that misses the point too. Silver is making an expected-value calculation, and for that purpose, it's exactly right to divide by ten to get a per-year estimate. Beyond this, Allison's 50% has got to be an extremely rough speculation (to say the least), and I think it confuses rather than clarifies matters to pull out the math. Nate's approximate calculation does the job without unnecessary distractions. Although I guess Lawrence's comment illustrates that Nate might have done well to include a parenthetical aside to explain himself to sophisticated readers.

This sort of thing has happened to me on occasion. For example, close to 20 years ago I gave a talk on some models of voting and partisan swing. To model votes that were between 0 and 1, we first did a logistic transformation. After the talk, someone in the audience--a world-famous statistician who I respect a lot (but who doesn't work in social science) asked about the transformation. I replied that, yeah, I didn't really need to do it: nearly all the vote shares were between 0.2 and 0.8, and the logit was close to linear in that range; we just did logit to be on the safe side. [And, actually, in later versions of this research, we ditched the logit as being a distraction that hindered the development of further sophistication in the aspects of the model that really did matter.] Anyway, my colleague responded to my response by saying, No, he wasn't saying I should use untransformed data. Rather, he was asking why I hadn't used a generalized linear model; after all, isn't that the right thing to do with discrete data. I tried to explain that, while election data are literally discrete (there are no fractional votes), in practice we can think of congressional election data as continuous. Beyond this, a logit model would have an irrelevant-because-so-tiny sqrt(p(1-p)/n) error term which would require me to add an error term to the model anyway, which would basically take me back to the model I was already starting with. This point completely passed him by, and I think he was left with the impression that I was being sloppy. Which I wasn't, at all. In retrospect, I suppose a slide on this point would've helped; I'd just assumed that everyone in the audience would automatically understand the irrelevance of discrete-data models to elections with hundreds of thousands of votes. I was wrong and hadn't realized the accumulation of insights that any of us gather when working within an area of application, insights which aren't so immediately available to outsiders--especially when they're coming into the room thinking of me (or Nate Silver, as above) as an "applied statistician" who might not understand the mathematical subtleties of probability theory.

P.S. Conflict-of-interest note: I post on Sides's blog and on Silver's blog, so I'm conflicted in all directions! On the other hand, neither of them pays me (nor does David Frum, for that matter; as a blogger, I'm doing my part to drive down the pay rates for content providers everywhere), so I don't think there's a conflict of interest as narrowly defined.

More like this

Basics: Standard Deviation

When we look at a the data for a population+ often the first thing we do is look at the mean. But even if we know that the distribution

Seasons, short and simple

I love this question: Why is it warmer in the summer than in the winter (for the Northern hemisphere)? Go ahead and ask your friends. I suppose they will give one of the following likely answers:

The Real Bozo Attempts to Atone: Why the DDWFTW Car Works

Technorati Tags: ddftw, bozos, markcc-screwups

BIO101 - Lecture 7 - Physiology: Coordinated Response

Last week we looked at the organ systems involved in regulation and control of body functions: the nervous, sensory, endocrine and circadian systems. This week, we will cover the organ systems that are regulated and controlled.

I post on Sides's blog and on Silver's blog, so I'm conflicted in all directions! On the other hand, neither of them pays me (nor does David Frum, for that matter

Best conflict-of-interest statement I've ever read. Well done!

Just wanted to say that I am enjoying your blog here. Some of the technical portions go over my level of statistics, however, I find the statistical view on everyday things to be quite informative.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Bye

July 11, 2010

I realize that I haven't been posting much here. We had some plans to use the Applied Statistics blog for other purposes but it didn't really work out, so from now on you can go to my main blog for your statistical entertainment.

"How many zombies do you know?" Using indirect survey methods to measure alien attacks and outbreaks of the undead

July 1, 2010

I've been told that it's zombie day, so I thought I'd link to this research article by Gelman and Romero: The zombie menace has so far been studied only qualitatively or through the use of mathematical models without empirical content. We propose to use a new tool in survey research to allow…

Scientists can read your mind . . . as long as the're allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

June 23, 2010

Maggie Fox writes: Brain scans may be able to predict what you will do better than you can yourself . . . They found a way to interpret "real time" brain images to show whether people who viewed messages about using sunscreen would actually use sunscreen during the following week. The scans were…

Ethical and data-integrity problems in a study of mortality in Iraq

April 27, 2010

See discussion here. I've linked to it from here because ScienceBlogger and investigative journalist Tim Lambert has written some on the topic.

Random matrices in the news

April 12, 2010

Mark Buchanan wrote a cover article for the New Scientist on random matrices, a heretofore obscure area of probability theory that his headline writer characterizes as "the deep law that shapes our reality." It's interesting stuff, and he gets into some statistical applications at the end, so I'll…