The Basics of Statistics V: A Quick Example

So the last post was pretty dense, and I haven't used an example since the first post, so I thought I'd throw one out there that you can play with. In what follows, I pretend to use the equations, but I'm actually doing all this in Excel. If you've got Excel, here are some helpful functions. AVERAGE gives you the mean of a range of numbers, VAR gives you the variance, and STDEV gives you the standard deviation. Note that VAR and STDEV give you the variance and standard deviation for a sample (i.e., using n-1 instead of n). If you want population variance and standard deviation, use VARP and STDEVP. Also, since I'm often interested in getting the standard error quickly, I usually use this formula to get it in Excel: STDEV(range)/(SQRT(COUNT(range)). Or you can just do the math by hand if you enjoy that sort of thing. On to the problem.

Imagine we're given a problem: what is the average height of male liberal bloggers in Washington DC? We could go out and survey every liberal blogger in the Washington DC, but since it's DC, there are about 3 gazillion liberal bloggers there, and we don't have the time or money to survey them all. So instead, we randomly select 30 male liberal bloggers from DC, and get their heights. Here are their heights in inches (sorry metric people, but DC is in the States):

62
62
76
68
59
62
60
74
63
60
63
68
74
61
69
60
66
57
66
72
74
64
71
47
66
66
71
60
60
77

First we compute the mean with ΣX/n. ΣX = 1958, n = 30, and 1958/30 = 65.27. So x = 65.27 inches.

Then we compute the variance, using Σ(X - x)2/(n-1). Remember, we're using n-1 because we're trying to estimate the variance of the population using a sample (see post III). Using that equation, we get a variance of s2 = 43.65. Take the square root of that, and we get the standard deviation, s = 6.61.

Next we need to compute some measure of our confidence that our sample mean represents our population mean. Obviously, it's unlikely that with one sample we'd get the exact mean of the population, so we need to decide how confident we want to be about our mean, and then find a range within which we can be that confident that the population mean occurs. Let's choose 95% confidence, and since we don't know the population's standard deviation and are therefore using the t-distribution, look up the critical t-value (see post IV) for 95% confidence in a t-table. We have 30 observations, so our degrees of freedom is equal to 30 - 1, or 29. In the table, scroll down to 29, then over to .025 (not .05, see post IV). There you'll find that our critical t-value is 2.76.

One last step, before we can get confidence intervals. We have to get the standard error. Using the sample's standard deviation as an estimate of the population's standard deviation, we compute the standard error (sx) with:

sx = s/√n

Plugging in our numbers, we get 6.61/√30, which gives us a standard error of 1.21. Now we can use our confidence interval equation from post IV, which will give us:

65.27 - (2.76 x 1.21) ≤ μ ≤ 65.27 + (2.76 x 1.21)

Doing the math, we get this for our 95% confidence interval: 61.93 ≤ μ ≤ 68.61.

We report back that according to our sample, the average height of a male liberal blogger in Washington DC is 65.27 inches, and that we're 95% certain that the average height is between 61.93 and 68.61 inches.

Tags

More like this

This is fairly off topic, but I am having trouble running a Tukey test. Which might because it's the wrong test to run for what I need.

Basically I have run an ANOVA and I need to check for variance between groups.

I have four variables, the first has two factors, the other three have 3 factors. Which gives 54 rows of data. There is one trial run and nine replicates, so essentially 10 trials all together.

For whatever reason SPSS won't let me run an ANOVA with a Tukey test on this data. So I downloaded PHStat2 for Excel which ran the test fine. I looked up the Q value entered it in and all seemed fine. Except PHStat ran a Tukey on columns and not rows. So I transposed the columns, but now I don't know how to calculate the Q value for more than 10 groups.

Is Tukey the right test? If so how do I look up the q value for my data?