# Mining for Statistical Significance

The curtain has been drawn and the secrets to data analysis revealed. Do you have a data set sitting around in need of analysis? Read this and learn how to find significant results somewhere — anywhere — in your data. Because negative results won’t get you published; and you won’t get hired/tenure if you don’t publish; and your career will be a failure. Here’s a taste:

There are always anomalies. The World Series has been swept 17 times, five more than the model would predict. Plug this into the BINOMDIST function in Excel. (Understanding how this function works is optional and may in some cases be a disadvantage.) You find that, if the probabilities in the model were correct, there would be 17 or more sweeps in 95 occurrences only 8% of the time. A rotten break: you’re three lousy percent under statistical significance.

1. #1 Wesley Cowan
January 10, 2007

Of course, by carefully stroking the results, you can prove that you always win the Monty Hall Problem, no matter which door Monty opens.

This reminds me much of the PEAR group, studying the effect of global conciousness on the generation of a stream of random numbers.

In effect, crackery.

2. #2 RPM
January 10, 2007

Or satire and sarcasm thick enough to be cut by an overused methaphor.

3. #3 Lemon Curry
January 10, 2007

Sports teams (and players) get ranked in a single dimension, and the underlying assumption is that the dimension is transitive.

Yet we could have 3 equally good teams — equally good in that they each average 10 runs a game — although Team A usually beats Team B, Team B usually beats Team C, and Team C usually beats Team A. This demonstrates an intransitive relation.

Mismatched finals can produce lopsided results when the winning team’s defense matches or betters the losing team’s offense, while the winners’ offense betters the losers’ defense.