# Political polling

I heard it again the other night. One of the TV chin strokers talking about this poll or that poll showing Obama (or McCain) ahead with a "statistically insignificant" lead, and I thought to myself, no one who knew much about statistics would use a phrase like that. Strictly speaking, while there may be something like statistical significance, there is no statistical insignificance. It is a nonsensical term that is becoming part of the language by use, so I know I can't stop it with a blog post. If I could, I would, because it invites serious misunderstanding in the speaker and listener alike. So let's recap what's wrong with saying "statistical insignificance."

What's intended by the statement? Probably something like, "Although a poll was taken showing Obama ahead by 1 percentage point, if an official election were held at the time the poll was taken we might find Obama would lose by 2 percentage points" -- or something like that. I think I'm being rather generous to the understanding of the pundits by spelling it out like this, but I'm in a generous mood. So let's first agree on the purpose of political polling. Presumably it's to estimate the relative sizes of Obama's and McCain's votes if the election were held on the day or days the poll was taken. You could determine this definitively if you found everybody who was going to vote in the election, asked them what their votes would be, and you could be assured they were telling you the truth. Let's grant the last assumption, since we wish to discuss statistics and not psychology or sociology.

Of course if we did poll everyone, it is almost the same things as holding the election instead of taking a poll, except that if we did it in the form of a poll we would immediately be faced with a problem: determining who was also going to vote that day. We know that the possibilities on voting day are limited to those eligible to vote in that jurisdiction, i.e., registered voters. But not all registered voters vote in every election. Different pollsters use different methods to figure out who a "likely voter" is and differences in "likely voter models" (the characteristics of a person that predict whether they will vote or not) vary among pollsters and therefore so will their polling results. After all, they are asking a different group of people of their models are different. To add to the problem, significant numbers of people have yet to register, so a likely voter model can't just confine itself to registered voters. Likely voters are not just a subset of registered voters (and vice versa). The only requirement is that you be legally eligible to vote. Pollsters don't ask 12 year olds for their opinions.

The statistical methods pollsters use are solidly based in probability theory but they assume we know the kind of probabalistic process that generated the data. We are going to assume a very simple probability model for how Obama/McCain votes are generated. We will model the propensity to vote for one of the two candidates by the outcome of a coin flip, where the probability that the coin comes up heads or tails is directly related to whether any person in the polled population will vote for Obama (heads) or McCain (tails). You might think this a very unrealistic model, but we are letting the abstract "randomness" of coin flipping stand for all the unknown factors that determine how an individual person votes. We are only interested in the summary behavior of all the voters, the probability they will vote for one or the other candidate.

Now if these probabilities aren't equal ("50-50") the coin isn't a fair coin. It's a biased coin, with a tendency to turn up heads more often than tails or the reverse, depending on how it is weighted. the job of the pollster is to accurately estimate the probabilities of heads and tails in the election coin. If the probabilities are, say, .51 Obama, .49 McCain, then on average 51% of the voters would vote Obama and 49% McCain. So the pollsters are trying to estimate the chance that heads or tails will come up by flipping the coin, which corresponds to asking people whom they will vote for. Asking 100 representative voters is like flipping the coin 100 times, etc. With that underlying probability model (or one more complicated), we can do some statistics, i.e., look at some actual data from a representative sample and interpret what we see about what the much larger population of voters will do. This process of using a representative smaller sample to tell us about a much bigger population is called statistical inference. We are inferring the "big" vote from the much smaller poll.

Each time you do the coin flipping experiment (whether each experiment is 10 flips or 100 flips or 1000 flips) you will get a number for the proportions of heads and tails. It won't be the same number each time. Once maybe it will be 53 heads and 47 tails. Another time it might be 40 heads and 51 tails, etc. Each set of 100 flips is called a sample, and the proportion of heads (or tails) in each trial of 100 flips is called a sample statistic. A statistic is a number calculated from your data. The sample statistic will jump around a little bit with each set of 100 (in the example 53, 49, . . .) and if you did this 100 coin flip experiment over and over again you would get a whole slew of these numbers. It is called the distribution of the sample statistic. In the polling example, the sampling statistic is the proportion of respondents who said "Obama" when asked who they prefer. To get a sampling statistic distribution for the polling example you would have to take the poll over and over again from a representative sample, asking the same questions. Pollsters don't do this. They only do it once. So they want to be sure their single sample statistic is pretty close to the true propensity to vote for Obama in the voter population, which is the number they are really interested in but don't know.

The way they do this is to increase the number of coin flips. The underlying mathematics of this depends on a deep mathematical result called the Law of Large Numbers, but essentially this is what it says. The more often you flip the coin, the more likely it is that the sample statistic (the number of heads or the propensity to vote for Obama) will be close to the true underlying probability you are interested in. Using statistical theory we know that if you flip the polling coin around 1000 times, your sample statistic (the number of heads or the Obama vote) rarely gets very far from the true underlying value. "Not very far" means rarely more than plus or minus three points from the true value in this case (that's the "margin of error" you read about). It is possible that sometimes the error is greater, but not too often (in this case just a probability of a few hundreths). Thus, using this method you can get information on the probability of an Obama/McCain vote by only asking 1000 people instead of 100 million.

The other problem is that the words "significant" and "insignificant" can be misread as "important" and "not important" (the common uses of the word). But statistically significant differences can be of no importance for many purposes while in politics a difference of one vote can be the difference between winning and losing. That's a problem with the whole "significant" terminology, which unfortunately is even harder to fix.

You can say my complaint about the use of the phrase "statistical insignficance" is a bit of a straw man argument, and I'll grant you it is. But it gives us a chance to talk about a lot of things related to it.

And anyway, it annoys me.

Tags

### More like this

##### Statistics, damn statistics and well kept secrets
Marilyn Mann pointed me to an interesting post by David Rind over at Evidence in Medicine (thanks!). It's a follow-on to an earlier post of his about the importance of plausibility in interpreting medical literature, a subject that deserves a post of its own. In fact the piece at issue, "HIV…
##### A Critical Cause of the Decline Effect: When Weak Effects Meet Small Sample Size
A couple of weeks ago, Jonah Lehrer wrote about the Decline Effect, where the support for a scientific claim often tends to decrease or even disappear over time (ZOMG! TEH SCIENTISMZ R FALSE!). There's been a lot of discussion explaining why we see this effect and how TEH SCIENTISMZ are doing ok…
##### Flipping Coins and Levers
I promise this is not a politics post. It just uses some vote totals for some fun math! The Minnesota senate race has so far a total of 2,422,811 votes between the two leading candidates. The margin separating them is 477. It's about as close to a 50:50 split as we've seen this cycle. Probably…
##### Krauthammer: We Are All Popperians Now?
While I'm away, I'll leave you with this introduction to likelihood theory (originally published Nov. 22, 2005). In the Washington Post last week, Charles Krauthammer boldly opposed the Tin Foil Helmet wing of the Republican Party by calling intelligent design a "fraud." The best part of his column…

Interesting point, but one complaint. Your statement
"The underlying mathematics of this depends on a deep mathematical result called the Central Limit Theorem" should have referred to the Law of Large Numbers. The CLT explains why the behavior of the different sample proportions settles down to a normal law: the LLN explains why we can be sure the proportions eventually get close to the "true" percentage.

I think you're being a little fussy here, aren't you?

You may not like the actual phrase, but it's surely better to make it clear that there's really nothing in it between Obama and McCain if there is only one percentage point between them. It has to be better than the totally accurate - but misleading - tabloid headline of "Obama leads McCain in the polls" !

Dean: My face is red. Mental lapse on my part. You are completely correct. I have corrected it in the post. Many thanks.

Martin: That's why I included the paragraph at the end. It annoys me.

It's as annoying as saying it's a statistical tie or a dead heat.

45-45 are those things. Obama up by one or McCain up by one is neither. If it were so, then FL 2000 would have been declared a stastical tie and Bush and Gore would each have had a desk in the oval office, or some such nonsense.

"Too close to know" or "well within the margin of error"is really more appropriate.

If you are a journalist or an ordinary informed citizen, do check out the 20 questions from the National Council on Public Polling, especially # 19:

What else needs to be included in the report of a poll?
The key element in reporting polls is context. Not only does this mean that you should compare the poll to others taken at the same time or earlier, but it also means that you need to report on what events may have impacted on the poll results.
A good poll story not only reports the results of the poll but also assists the reader in the interpretation of those results. If the poll shows a continued decline in consumer confidence even though leading economic indicators have improved, your report might include some analysis of whether or not people see improvement in their daily economic lives even though the indicators are on the rise.
If a candidate has shown marked improvement in a horse race, you might want to report about the millions of dollars spent on advertising immediately prior to the poll.
Putting the poll in context should be a major part of your reporting

By DemFromCT (not verified) on 11 Sep 2008 #permalink

So, talking about "statistically insignificant" is a bit like talking about a hole in the International Space Station in terms of how much vacuum leaks into the station?

I only mean this as real light criticism but in the spirit of statistical "pet peeves" you actually got into one of mine. While you did a nice job explaining sampling variance and confidence intervals you relied on the well worn path of coin-tosses to do it. I don't have a huge problem with that in this context but it does raise my pet peeve about assuming independence of the sample (like coin tosses) in a context where there is no such independence (like voting).

This is actually most annoying in sports where some otherwise smart people make statements about related events using statistics that assume the events are independent when they are most definitely not.

By floormaster squeeze (not verified) on 11 Sep 2008 #permalink

floormaster: Isn't the question here whether the events are independent? Thus A's answer doesn't affect B's answer and vice versa, i.e., P(A and B)= P(A)*P(B).

"Many telephones are businesses, so they aren't of interest, and many people whose views are of interest may not be reachable that way: they don't have a land line (e.g., no phone or only a cell phone) or they may not be home when you call or they may refuse to answer your questions"
Are polls only using land-lines? I would think this would definitely skew results, as a large % of under-thirtyʻs only use cell phones. And does the use of answering machines and call-screening effect results? Any answers? Mahalo, Terry McNeely

By terry mcneely (not verified) on 11 Sep 2008 #permalink

terry: Federal regulations do not allow autodialing of cell phone numbers. They have to be manually entered. Accepting such a call potentially costs money, so there are restrictions. I don't know if there are even lists of all cell phone numbers the way there are of land line numbers. The cell phone under count is of serious concern to pollsters because it is known that cell phone only people are substantially different than those that can be reached by a landline (although they may also have a cell phone).

Revere,
thank you. i suspect a larger proportion of cell-phone only people prefer Obama over McCain.

By terry mcneely (not verified) on 11 Sep 2008 #permalink

Revere, that was a very interesting (albeit verbose) explanation of the issue. :)

Completely aside from politics and the ignorance of pollsters, this smacks of a common trap that many scientists fall into. I.e., if the p-level isn't .05 or less, there is no result.

First of all, the 95% confidence interval (p=.05) is a completely arbitrary number. Granted, it has since acheived practically sacrosanct status in the scientific community, but really, .05 is just a number. I understand the general inclination for some sort of "standard" per se, but people should understand that confidence intervals and p-levels are a continuum, not a cliff that drops off at 5%.

As you alluded to in your post, significance is really a function of two things: sample size and effect size. A truly thoughtful person needs to understand all three statistical components to fully understand what is going on in the world. For example, if a person had an effect size r (or a correlation) of .30 (generally signifying a moderately strong effect), but the p-level was .06 (gasp!) does that mean that the result is insignificant (i.e. not real), since p was not less than .05? No, of course not. It may have just been an issue of too small a sample size.

On the flip-side, effect size isn't the entire story either. My stats professor in grad school used to love telling the classic aspirin story... as he told it, when scientists were measuring the effect of aspirin on heart attacks, they found a miniscule pathetic effect size r of .04. That's so close to zero, it's practically nothing. But once the researchers found this effect size, they immediately stopped the study because they thought that it would be unethical to not give heart disease patients aspirin. When compared to the number of patients at risk for heart attacks, that itty bitty .04 effect size translated into thousands of lives saved.

Anywho, the moral of the story, for all those scientists, researchers, and statisticians out there... make sure you understand the entire picture, instead of relying on just one statistic.

On a different note, Terry that's an excellent point. More and more people are using cell phones as their sole phone line, and generally these people a little different from the general population. I hope the telephone survey-ers in the world are taking note.

the cell phone Q is an interesting one... gallup, CBS and others include some cell phones. if you're really into polling, check out www.pollster.com for more.

By DemFromCT (not verified) on 11 Sep 2008 #permalink

Tasha: Yes, all your points are correct. The egregious error of thinking that not rejecting the null is the same as accepting the null is extremely common. There was much more I could have said (verbose as I was) and the confusion between public health signficance and statistical significance is one of my hobby horses. Things can be of public health significance and not statistical significance and vice versa. I purposely didn't comment on the 0.05 level because there is too much to say about it (I once tried to find its historical origins but couldn't pin it down completely), except that most biologists and public health folks aren't aware that physicists frequently use .10 as a convention. But we know that's a soft science!

Dem: Thanks for the link. The cell phone Q. is indeed interesting. I'll head over to read it now.

Revere: You are right about noting I was talking about assuming the voting events are independent. It may be of little importance as most of the voting events are independent.

The issue of cell phones on polling really are not that new--that is, if you honestly want to control for it you can without too much of effect on the variance. It is getting worse I suppose though.

Random dialing modes were always biased--larger households and larger incomes had more phone lines and there has always been a percent of people who did not have a phone. This was a problem in most of the 90's for RDD health insurance surveys as not having a phone was shown to be correlated highly with not having health insurance in the Census/BLS (CPS) data.

By floormaster squeeze (not verified) on 12 Sep 2008 #permalink