One last word on the Geiers: So good it shouldn't be buried in the comments

By oracknows on March 6, 2006.

I was going to give this a rest for a while, but this is too good not to post a brief note about.

Posted in the comments of my piece debunking the Geiers' pseudoscience and their laughable "scientific" article claiming to show a decrease in the rate of new cases of autism since late 2002, when thimerosal was removed from vaccines completely other than some flu vaccines was this gem of a comment, by one MarkCC, which stated the essence of what was wrong with the Geiers' so-called "statistical analysis" of the VAERS database:

Here's the key, fundamental issue: when you're doing statistical analysis, you don't get to look at the data and choose a split point. What the Geiers did is to look at the data, and find the best point for splitting the dataset to create the result they wanted. There is no justification for choosing that point except that it's the point that produces the result that they, a priori, decided they wanted to produce.
Time trend analysis is extremely tricky to do - but the most important thing in getting it right is doing it in a way that eliminates the ability of the analysis to be biased in the direction of a particular a priori conclusion. (In general, you do that not to screen out cheaters, but to ensure that whatever correlation you're demonstrating is real, not just an accidental correlation created by the human ability to notice patterns. It's very easy for a human being to see patterns, even where there aren't any.)

Redo the Geiers analysis using any decent time-trend analysis technique - even a trivial one like doing multiple overlapping three-year regressions (i.e., plot the data from '92 to '95, '93 to '96, '94 to '97, etc) and you'll find that that nice clean break point in the data doesn't really exist - you'll get a series of trend lines with different slopes, without any clear break in slope or correlation.

So - to sum up the problem in one brief sentence: in statistical time analysis, you do not get to pick break points in the time sequence by looking at the data and choosing the break point that is most favorable to your desired conclusion.

Exactly! Unfortunately, that's exactly what the Geiers did.

A proper statistical analysis of such data, looking for time points at which a rate of change in a variable changes, is designed such that there is no bias in selecting a time point at which a significant change in slope is observed. As much as the Geiers might want to believe that there is a marked change in the slope of the curve beginning around late 2002 to early 2003, they can't assume that there is such a breakpoint before doing the analysis.

Once again, what pseudoscientists like the Geiers never seem to understand is that all those precautions we scientists take with control groups and statistical analyses designed to minimize investigator bias exist because we realize how easy it is for a scientist, particularly a medical scientist who is invested in finding a cure for a particular disease or condition, to be seduced into believing something that is not supported by data. (If they did understand, they wouldn't use such simplistic and easily debunked "scientific" methodology.) It's a very human tendency, and the scientific method is designed to minimize that tendency. That's why it takes so much training to overcome.

Some scientists never do overcome this tendency, and if they fall deeply enough into belief over evidence they become pseudoscientists.

Like the Geiers.

Thanks, MarkCC.

More like this

The Geiers go dumpster-diving yet again

Curse you, Mark and David Geier. I'm getting tired of having to subject my scientific and critical thinking skills to the assaults on science and reason that you routinely publish in dubious journals to use as weapons in your apparently never-ending crusade to extract as much money as possible out…

Selective Data and Global Warming

One of the most common sleazy tricks used by various sorts of denialists comes back to statistics - invalid and deceptive sampling methods. In fact, the very first real post on the original version of this blog was a shredding of a paper by Mark and David Geier that did this. Proper statistical…

A warm-up for the David Kirby-Arthur Allen debate

In a warmup for his "debate" later today in LaJolla, CA with Arthur Allen, David Kirby spews the usual pseudoscience again. I can't believe he's still making the long debunked "autism has the same symptoms as mercury poisoning" statement with a straight face, and then continuing to parrot the same…

Epi Wonk versus Mark and David Geier: Guess who wins?

There's a new blog in town that I've been meaning to pimp. It's a blog by a retired epidemiologist who got things started looking at the role of diagnostic substitution in autism diagnoses and argued that the autism "epidemic" is an artifact of changing diagnostic criteria. The blog is Epi Wonk,…

In fact, what the Geiers did was a textbook case of the "Texas sharpshooter fallacy," so named from a possibly apocryphal story about a Texan who brags about his target-shooting ability. He stands way back from the side of a barn, fires wildly hitting it all over the place, and then draws a target around the places he hit.

The point is that in statistics, you can't use the same set of data to generate a hypothesis and then test that hypothesis; if you do so, you're reasoning in a circle. Introductory stats courses do a really bad job of explaining this, just saying "don't look at the data before testing it" which often leads students to something similar to the New Age "interpretation" of quantum mechanics.

Exactly, ebohlman - that's why in microarray expts or any other sort of biomarker studies we first have a 'training set' and then test the hypothesis in a completely different experimental set of subjects.

Ever see a picture of Dr. Geier?

Does he look like Dracula? Always wanted to meet Dracula.

While I wholeheartedly agree that the abuse of statistics through reading data before positing a hypothesis is dead wrong, there exists a whole field - data analysis - that superficially does just that. Under these circumstances, a respectable scientist has to draw attention to any pitfalls that he/she may perceive when applying inferential procedures and reaching conclusions 'a posteriori'.

Glad you unearthed this gem from the comment pile: I missed it.

While I wholeheartedly agree that the abuse of statistics through reading data before positing a hypothesis is dead wrong, there exists a whole field - data analysis - that superficially does just that. Under these circumstances, a respectable scientist has to draw attention to any pitfalls that he/she may perceive when applying inferential procedures and reaching conclusions 'a posteriori'.

That's a key distinction, isn't it? Real scientists are very careful to qualify the limitations of their methodology, particularly when doing retrospective analyses, which by their very nature are much more prone to bias and incorrect conclusions than prospective studies--even when the data used isn't as questionable as what is contained in the VAERS database. Pseudoscientists don't bother to list the limitations of their analysis or only do so in a very perfunctory fashion, mainly because they don't want to weaken their conclusion, which was usually reached before they ever looked at the data.

The bottom line is that correlation does not equal causation, and the Geiers haven't even been able to demonstrate correlation convincingly.

Here's the key, fundamental issue: when you're doing statistical analysis, you don't get to look at the data and choose a split point.

While I agree with the sentiment of the above statement, I do think a few things ought to be clarified. First, you can choose a split point a priori, e.g. the stock market crashed on March 4, let's collect data on stock prices and see whether the Mar 4 crash affected it. I didn't see where the Geiers spelled out whether the chose their point before seeing the data or not. Given that VAERS and CDDS data are public, I conservatively assume not. I expect to see some formal hypothesis test that directly addresses the split point. The so-called interrupted time series methods are one good class of methods, although I looked at their CDDS data and decided it didn't need time series analysis after all (no autocorrelation or partial autocorrelation). You can also do some regression model-building techniques to address this hypothesis test. Of course, the Geiers tried to justify their change point by looking at the slopes of two lines and comparing, and well, that was rather bizarre.

The other thing to note is that the Geiers did not really perform a changepoint analysis. Take a look again. They overlap their two regression lines by a year. I certainly haven't seen that in the 15 years that I have studied and done statistics.

This paper was certainly not a red letter day for statistics.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Turning out the lights and moving on: Goodbye, old ScienceBlogs blog, hello new blog

October 30, 2017

Today is the last day that ScienceBlogs will exist. Sometime today the site will go into read-only mode. A few days later, it will disappear completely from the Internet. It's a sad thing to contemplate after all these years. Whatever happened later, I will always be grateful for the start in…

A quick update on the migration to a new domain

October 23, 2017

Here's a brief update on the move, announced last week. Things are progressing, and most of my old material has been transferred to the new blog, which is located at respectfulinsolence.com. Of course, there are still some things to tweak and fix, which is why, given how insanely busy this week is…

A change is gonna come. Respectful Insolence is moving.

October 16, 2017

Well, QEDCon is over, and this box of blinky lights is on its way back across the pond to its home in the US, having had an excellent time imbibing skepticism from its (mostly) British and European partners in skepticism. Before I left, I made a somewhat cryptic remark about "major changes" to this…

And the box of blinky lights has arrived in Manchester for QEDCon

October 13, 2017

As you probably noticed, I didn't manage a post yesterday. Nor did I manage one today, other than this. That's because I was busy preparing for QEDCon, where I will be on a panel and giving a talk, and, of course, putting together my talk. As I write this, I'm horrendously jet lagged; so I probably…

On the "integration" of quackery into the medical school curriculum

October 11, 2017

QEDCon is fast approaching (indeed, I can't believe I have to leave for Manchester tomorrow night), and because my talk there will be about the phenomenon of "integrative medicine," I've been thinking a lot about it. As I put together my slides, I can't help but see my talk evolving to encompass…