In general, when we gather data, we expect to see a particular pattern to
the data, called a normal distribution. A normal distribution is one
where the data is evenly distributed around the mean in a very regular way,
which when plotted as a
histogram will result in a bell curve. There are a lot of ways of
defining “normal distribution” formally, but the simple intuitive idea of it
is that in a normal distribution, things tend towards the mean – the closer a
value is to the mean, the more you’ll see it; and the number of values on
either side of the mean at any particular distance are equal.
If you plot that a set of data with a normal distribution
on a graph, you get something that looks like a bell, with the hump of the
bell positioned at the mean.
For example, here’s a graph that I generated using random numbers. I
generated 1 million random numbers between 1 and 10; divided them into groups
of ten; and then took the sum of each group of 10. The height of each point in
the graph at each x coordinate is the number of times the sum was was that
number. The mean came out to approximately 55, the mode was 55, and the median
was 55 – which is what you’d hope for in a normal distribution. The number of
times that 55 occurred was 432,000. 54 came up 427,000 times; 56 came up
429,000. 45 came up 245,000 times; 35 came up 38,000 times; and so on. The
closer a value is the the mean, the more often it occurs in the population;
the farther it is from the mean, the less often in occurs.
In a perfectly normal distribution, you’ll get a perfectly smooth bell
curve. In the real world, we don’t see perfect normal distributions, but most
of time in things like surveys, we expect to see something close. Of
course, that’s also the key to how a lot of statistical misrepresentation is
created – people exploit the expectation that there’ll be a normal
distribution, and either don’t mention, or don’t even check, whether the
distribution is normal. If it is not normal, then many of
the conclusion that you might want to draw don’t make sense.
For example, the salary example from the mean, median, and mode post is also using this. The reason that the median is so different from the mean is because the distribution is severely skewed away from a normal distribution. (Remember, in a proper normal distribution, the number of values included at the same distance either side of the mean should be equal. But in
this example, the mean was 200,000; if you went plus 100,000, you’d get one value; if you went minus 100,000, you’d also get one – which looks good.
But if you went plus 200,000, you’d still get just one; if you went minus 200,000, you’d get twenty-one values!) But it’s a common rhetorical trick to take a very abnormal distribution, not mention that it’s abnormal, and
quote something about the mean in order to support an argument.
For example, the last round of tax cuts put through by the Bush administration was very strongly biased towards wealthy people. But during the last presidential election, in speech after speech, ad after ad, we heard about how much the average American taxpayer saved as a result of the tax cuts. In fact, most people didn’t get much; a fair number of people saw an effective increase because of the AMT; and a small number of people got huge cuts.
For a different example, the high school that I went to in New Jersey was considered one of the best schools in the state for math. But the vast majority of the math teachers there were just horrible – they had three or four really great teachers, and a dozen jackasses who should never have been allowed in front of a classroom. But the top performing math students in the school did so well that we significantly raised the mean for the school, making it look as the the typical student in the school was good at math. In fact, if you looked at a graph of the distribution of scores, what you would see would be what’s called a bimodal distribution: there would be two bells side by side – a narrow bell toward the high end of the scores (corresponding to the scores of that small group of students with the great teachers), and a shorter wide bell well to its left, representing the rest of the students.