A Tale of Two Polls, or What I Learned from 16S rRNA Microbiome Analysis

I'll return to the Research 2000 poll I discussed Wednesday, and also talk about this Gallup poll Digby discusses (and I think misinterprets), because I think we have to really think about the data we're collecting--and the questions in those polls really are different in quality from each other. But first, the 16rRNA.

Something that's applicable to many fields is that you have to understand the limitations of your data, not just the strengths. In addition, you also imagine what the data would look like given certain outcomes: given scenario X, we would expect to see A, and given scenario Y, we would expect to see B--if you can't really tell A and B apart, then you've reached your limits of resolution given your data type.

One way to determine what bacteria live on you and in you (this has implications for disease and health)--the microbiome--is to sequence a gene found in all bacteria known as 16s rRNA; think of it as a universal barcode of life. The good news is that we can sequence hundreds of thousands of these genes quickly and rapidly. The not-so-good news is that there's an error rate of around one percent*. Typically, we sequence around 400 nucleotides, which means we get, on average, four errors per read. But that's the average. (I'm getting to the polls, I swear).

In many microbial communities there is often a dominant species. In the vaginal tract (VAGINAZ!!! AIIIEEE!!!), Lactobacillus can account for ~95% of the organisms (or even more): that is, ~9,500 reads will be from Lactobacillus. This means some of the reads will have more than four errors--lots more. In fact, a few reads will have so many errors we will either classify them as a different genus, or even think we've discovered something new, when, in fact, this is just sequencing error. The irony is that less sequencing (heaven forbid!) would give you a more accurate answer (or at least avoid the false positives). In other words, we have to understand that if we sequence deeply in skewed communities, we run the risk of 'discovering' non-existent microbial diversity.

You have to understand the limitations of your methods.

Onto the Gallup Poll which reported:

Half of Americans (50%) believe government should do less to regulate business. The rest are divided between saying things are about right (23%) and that the government should regulate more (24%). The majority (57%) are worried that there will be too much government regulation of business.

While currently not in the for-profit private sector, my experience has led me to believe that people are reacting to a lot of stupid 'paperwork' issues that either cost a lot of money to deal with (i.e., you have to hire someone who understands how to do all of this) or that are a headache for a business owner or employee who isn't really good at navigating this stuff (i.e., a plumber wants to and knows how to do plumbing, not fill out forms). At a previous job, I was thrust into this role (at many universities, there are professionals who deal with this), and the 'regulation' drove me nuts.

But if Gallup had asked people things like:

1) do you think businesses should be allowed to pollute when there is a safer alternative?

2) do you think workers should be placed in unsafe conditions?

they would be much more pro-regulation. Also, if you changed 'businesses' in question #1 to 'corporations', you would probably see an even higher pro-regulation response. We have to be really careful in interpreting polling questions, particularly when a complex phenomenon is boiled down to a binary response (although those binary responses are statistically tractable, if not necessarily always accurate). In other words, ask yourself how a person who has been frustrated by paperwork, but doesn't think toxic waste should be dumped into the local water supply feels about "regulation." Very subtle things in the questions could really shift the answers.

Think of it this way: both centrist-liberal Paul Krugman and wackaloon Judd Gregg would probably answer that Obama hasn't done a great job managing the economy, but for very different reasons (Krugman would like more deficit spending, Gregg foolishly less).

This brings me to the Research 2008 poll. A couple commenters argued that this poll shouldn't be taken seriously, I think, in large part, because the results are troublesome to believe. But the questions aren't really open to interpretation: either Obama is a socialist, in one's opinion, or he isn't. Either he hates white people, or he doesn't. There's nothing vague at all in the poll whatsoever--which is why the results are so shocking. Could one come up with some questions that Democrats would answer 'crazy', or for which there is 'bipartisan' crazy? Sure.

But, unless you think these Republicans are pulling the pollster's leg (and one person argued that only crazy people answer polls--in which case, every poll is bullshit, which could be), those responses really are nuts and not open to multiple interpretations. Thirty years ago, even in Virginia (where I grew up), people who did believe Obama (or in that case, Doug Wilder) was a 'racist who hates white people' would have never said that to a complete stranger over the phone--certainly not a third of the respondents. To the extent there's a 'bias' here, it's that a lot of people feel validated enough to say this crazy shit out loud. To other people.

If you don't like the data because they make you uncomfortable, you don't ignore them (that's what creationists and other denialists do). Look, I think Republican economics are nuts, but people can disagree about policy issues, confront these arguments with data, and so on. But we are so far beyond marginal tax rates here. Do you think I want to live in a country where the base of one party is out of its mind? Where more than forty percent of that rank-and-file fail to respond to the idea of secession with anything other than with "Hell no!"?

Of course I don't. But we do. And we have to fix that--or at least, recognize that it exists. Clicking our heels and pretending it will go away won't get the job done.

Related post: Conservative economist Bruce Bartlett is disgusted by the results--and, he also provides a very nice table. Most of the commenters also point out that Research 2000 is a legit polling operation.

Tangential aside: Having been involved with public health-related polling, one reason why Democrats weren't asked these questions is because polling is expensive--you're talking about tens of thousands of dollars more to double the sample.

More like this

With regards to dealing with species over-representation, do people ever use hybridization (to magnetic beads or glass slides) to subtract out the known highly-present species?

We do this (well, we pay a company in Russia to do this) for cDNA libraries that are used sequencing purposes. And its the same general idea that we attempt to use for RNA-seq.

(subtraction of rRNAs, which sucks and doesn't work and even when the bioanalyzer says there's 0% rRNA you still get 95% of your reads mapping there).

In the example of the human vaginal microflora, I would think subtracting out the known lactobacilli 16s rRNAs prior to cDNA / library construction would help with that a little bit. And allow either less sequencing to accomplish the same depth or the same sequencing to identify rare species.

That said, I'm assuming people have either already done this or considered it etc.

... which is why the results are so shocking.

Uh ... "shocking"? Really?
That poll tells us exactly what the Republican party pundits have been so pridefully boasting about for the last few years.
But perhaps you don't speak their language.

One way to determine what bacteria live on you and in you (this has implications for disease and health)--the microbiome--is to sequence a gene found in all bacteria known as 16s rRNA; think of it as a universal barcode of life. The good news is that we can sequence hundreds of thousands of these genes quickly and rapidly. The not-so-good news is that there's an error rate of around one percent*. Typically, we sequence around 400 nucleotides, which means we get, on average, four errors per read. But that's the average. (I'm getting to the polls, I swear).

think of it as a universal barcode of life. The good news is that we can sequence hundreds of thousands of these genes quickly and rapidly. The not-so-good news is that there's an error rate of around one percent*. Typically, we sequence around 400 thank you very goood

universal barcode of life. The good news is that we can sequence hundreds of thousands of these genes quickly and rapidly. The not-so-good news is that there's an error rate of around one percent*. Typically, we sequence around 400 thank you very goood yes thx

be much more pro-regulation. Also, if you changed 'businesses' in question #1 to 'corporations', you would probably see an even higher pro-regulation response. We have to be really careful in interpreting polling questions, particularly when a complex phenomenon is boiled down to a binary response (although those binary responses are statistically tractable, if not necessarily always accurate). In other words, ask yourself how a person who has been frustrated by paperwork, but doesn't think toxic waste should be dumped into the local water supply feels about "regulation." Very subtle things in the questions could really shift the answers.