In general, when we gather data, we expect to see a particular pattern to

the data, called a *normal distribution*. A normal distribution is one

where the data is evenly distributed around the mean in a very regular way,

which when plotted as a

histogram will result in a *bell curve*. There are a lot of ways of

defining “normal distribution” formally, but the simple intuitive idea of it

is that in a normal distribution, things tend towards the mean – the closer a

value is to the mean, the more you’ll see it; and the number of values on

either side of the mean at any particular distance are equal.

If you plot that a set of data with a normal distribution

on a graph, you get something that looks like a bell, with the hump of the

bell positioned at the mean.

For example, here’s a graph that I generated using random numbers. I

generated 1 million random numbers between 1 and 10; divided them into groups

of ten; and then took the sum of each group of 10. The height of each point in

the graph at each x coordinate is the number of times the sum was was that

number. The mean came out to approximately 55, the mode was 55, and the median

was 55 – which is what you’d hope for in a normal distribution. The number of

times that 55 occurred was 432,000. 54 came up 427,000 times; 56 came up

429,000. 45 came up 245,000 times; 35 came up 38,000 times; and so on. The

closer a value is the the mean, the more often it occurs in the population;

the farther it is from the mean, the less often in occurs.

In a perfectly normal distribution, you’ll get a perfectly smooth bell

curve. In the real world, we don’t see perfect normal distributions, but most

of time in things like surveys, we *expect* to see something close. Of

course, that’s also the key to how a lot of statistical misrepresentation is

created – people exploit the expectation that there’ll be a normal

distribution, and either don’t mention, or don’t even check, whether the

distribution *is* normal. If it is *not* normal, then many of

the conclusion that you might want to draw don’t make sense.

For example, the salary example from the mean, median, and mode post is also using this. The reason that the median is so different from the mean is because the distribution is severely skewed away from a normal distribution. (Remember, in a proper normal distribution, the number of values included at the same distance either side of the mean should be equal. But in

this example, the mean was 200,000; if you went plus 100,000, you’d get one value; if you went minus 100,000, you’d also get one – which looks good.

But if you went plus 200,000, you’d still get just one; if you went minus 200,000, you’d get twenty-one values!) But it’s a common rhetorical trick to take a very abnormal distribution, not mention that it’s abnormal, and

quote something about the mean in order to support an argument.

For example, the last round of tax cuts put through by the Bush administration was *very* strongly biased towards wealthy people. But during the last presidential election, in speech after speech, ad after ad, we heard about how much the *average* American taxpayer saved as a result of the tax cuts. In fact, most people didn’t get much; a fair number of people saw an effective *increase* because of the AMT; and a small number of people got *huge* cuts.

For a different example, the high school that I went to in New Jersey was considered one of the best schools in the state for math. But the vast majority of the math teachers there were just horrible – they had three or four really great teachers, and a dozen jackasses who should never have been allowed in front of a classroom. But the top performing math students in the school did *so* well that we significantly raised the mean for the school, making it look as the the *typical* student in the school was good at math. In fact, if you looked at a graph of the distribution of scores, what you would see would be what’s called a *bimodal* distribution: there would be *two* bells side by side – a narrow bell toward the high end of the scores (corresponding to the scores of that small group of students with the great teachers), and a shorter wide bell well to its left, representing the rest of the students.