The Basics of Statistics V: A Quick Example

So the last post was pretty dense, and I haven't used an example since the first post, so I thought I'd throw one out there that you can play with. In what follows, I pretend to use the equations, but I'm actually doing all this in Excel. If you've got Excel, here are some helpful functions. AVERAGE gives you the mean of a range of numbers, VAR gives you the variance, and STDEV gives you the standard deviation. Note that VAR and STDEV give you the variance and standard deviation for a sample (i.e., using n-1 instead of n). If you want population variance and standard deviation, use VARP and STDEVP. Also, since I'm often interested in getting the standard error quickly, I usually use this formula to get it in Excel: STDEV(range)/(SQRT(COUNT(range)). Or you can just do the math by hand if you enjoy that sort of thing. On to the problem.

Imagine we're given a problem: what is the average height of male liberal bloggers in Washington DC? We could go out and survey every liberal blogger in the Washington DC, but since it's DC, there are about 3 gazillion liberal bloggers there, and we don't have the time or money to survey them all. So instead, we randomly select 30 male liberal bloggers from DC, and get their heights. Here are their heights in inches (sorry metric people, but DC is in the States):

62
62
76
68
59
62
60
74
63
60
63
68
74
61
69
60
66
57
66
72
74
64
71
47
66
66
71
60
60
77

First we compute the mean with ΣX/n. ΣX = 1958, n = 30, and 1958/30 = 65.27. So x = 65.27 inches.

Then we compute the variance, using Σ(X - x)2/(n-1). Remember, we're using n-1 because we're trying to estimate the variance of the population using a sample (see post III). Using that equation, we get a variance of s2 = 43.65. Take the square root of that, and we get the standard deviation, s = 6.61.

Next we need to compute some measure of our confidence that our sample mean represents our population mean. Obviously, it's unlikely that with one sample we'd get the exact mean of the population, so we need to decide how confident we want to be about our mean, and then find a range within which we can be that confident that the population mean occurs. Let's choose 95% confidence, and since we don't know the population's standard deviation and are therefore using the t-distribution, look up the critical t-value (see post IV) for 95% confidence in a t-table. We have 30 observations, so our degrees of freedom is equal to 30 - 1, or 29. In the table, scroll down to 29, then over to .025 (not .05, see post IV). There you'll find that our critical t-value is 2.76.

One last step, before we can get confidence intervals. We have to get the standard error. Using the sample's standard deviation as an estimate of the population's standard deviation, we compute the standard error (sx) with:

sx = s/√n

Plugging in our numbers, we get 6.61/√30, which gives us a standard error of 1.21. Now we can use our confidence interval equation from post IV, which will give us:

65.27 - (2.76 x 1.21) ≤ μ ≤ 65.27 + (2.76 x 1.21)

Doing the math, we get this for our 95% confidence interval: 61.93 ≤ μ ≤ 68.61.

We report back that according to our sample, the average height of a male liberal blogger in Washington DC is 65.27 inches, and that we're 95% certain that the average height is between 61.93 and 68.61 inches.

Tags

More like this

So far we've been talking about different distributions and their parameters. If we're looking at a population with known parameters, then we're going to be dealing with either a normal distribution or a standardized normal distribution (Post I and II). If we're dealing with samples, we're going to…
Before we start in on new stuff, let's recap what we've covered so far. We started with the Central Limit Theorem, which tells us that if a bunch of random variables go into determining the values of yet another variable, then the values of that variable will approximate a normal distribution. The…
I was chatting with a friend about a few quantitative genetic "back-of-the-envelopes." Specifically, about the expectation of the heights of the offspring of any given couple in the United States. I say the United States because it is a nation where most people get enough to eat; that means that…
Question below about the details of what conservative Democrats or liberal Republicans might believe, etc. I decided to look for a few questions. I removed Independents because their sample sizes are a bit smaller. I clustered all those with socioeconomic status 17-47 as "Low" and those from 47-98…

This is fairly off topic, but I am having trouble running a Tukey test. Which might because it's the wrong test to run for what I need.

Basically I have run an ANOVA and I need to check for variance between groups.

I have four variables, the first has two factors, the other three have 3 factors. Which gives 54 rows of data. There is one trial run and nine replicates, so essentially 10 trials all together.

For whatever reason SPSS won't let me run an ANOVA with a Tukey test on this data. So I downloaded PHStat2 for Excel which ran the test fine. I looked up the Q value entered it in and all seemed fine. Except PHStat ran a Tukey on columns and not rows. So I transposed the columns, but now I don't know how to calculate the Q value for more than 10 groups.

Is Tukey the right test? If so how do I look up the q value for my data?