The Basics of Statistics IV: Confidence Intervals

So far we've been talking about different distributions and their parameters. If we're looking at a population with known parameters, then we're going to be dealing with either a normal distribution or a standardized normal distribution (Post I and II). If we're dealing with samples, we're going to use either the sampling distribution of means, if the population parameters are known, or more often, the t-distribution if they're not (post III). Normal and standardized distributions allow us to determine the probability associated with a particular value of a variable in a population, and thus how likely that value is. Sampling and t-distributions let us look at the probabilities associated with sample means. That is, the sampling and t-distributions give us a measure of the probability of getting a sample from a parent population with a particular mean. This takes us a next step closer to making inferences from statistics, because we can know the likelihood of getting a sample with mean x, then we can also determine how confident we are that the population mean falls within a range around x. To do this, we compute confidence intervals.

Let's start by computing confidence intervals using the sampling distribution. Obviously, we won't be doing this very often in the real world, because as we saw in the last post, we can only use the sampling distribution if we know the population's standard deviation (σ). But let's start with it anyway, because the logic of confidence intervals is a bit easier to spell out this way.

The first step is to determine how confident we want to be. We could pick any number, but the more confident we are the better, right? It's pretty standard in social science research to use confidence intervals of 95%. That is, the range within which we are 95% certain the population mean occurs (put another way, if we took a whole lot of samples from the population, about 95% of those samples would have means in that range). Since we're using the sampling distribution with the known standard deviation of the population, we can first look to the standardized normal distribution to figure out the z-scores associated with our chosen confidence level. For example, if we choose 95% confidence, we have to find the z-scores (+ and -) between which 95% of the values occur. Then we use the following equation, in which z stands for the z-score value associated with 95% confidence:

x - zσ ≤ μ ≤ x + zσ

Looking up the z-score associated with 95% confidence (you can use an interactive z-table here), I find that it is +/- 1.96. This is sometimes called the critical z-score, or critical z. So substituting this number for z, we can say with 95% confidence that the population mean is between the sample mean (x) minus 1.96 times the standard error (σ/√n; represented in the equation as σx) and the sample mean plus 1.96 times the standard error.

OK, if you've wrapped your mind around that, let's move on to the more likely case of computing confidence intervals without a known population standard deviation. In this case, we have to use the sample standard deviation to estimate the population standard deviation. That means we're dealing with a t-distribution, so instead of using the z-scores associated with our level of confidence, we'll use t-scores. First, we pick a confidence level -- let's go with 95% again -- and find the t-value associated with that (the critical t-value, or critical t). Then we plug the sample mean and standard error (obtained with s/√n; represented as sx) and the critical t into this equation:

x - tsx ≤ μ ≤ x + tsx

Of course, since we learned in the last post that there is a different t-distribution for each number of degrees of freedom, we know that there's a different critical t-value for each number of df as well. Suppose we have 12 degrees of freedom. We look up the critical t-value in a table like this one by going down the table to 12 df (in the far left column), then scrolling over until we find the value under .025 (not .05, because adding .05 from both tails of the distribution would give us .10, or the critical t for 90% confidence intervals). The critical t(df = 12) = 2.18, so we can now say that with 95% confidence that the population mean occurs between x plus or minus 2.18 times the standard error (sx).

Clear as mud, right? If you're confused at this point, don't worry, you're not alone. In the next post, we'll work through an example, and that should make things less muddy.


More like this

So the last post was pretty dense, and I haven't used an example since the first post, so I thought I'd throw one out there that you can play with. In what follows, I pretend to use the equations, but I'm actually doing all this in Excel. If you've got Excel, here are some helpful functions.…
Before we start in on new stuff, let's recap what we've covered so far. We started with the Central Limit Theorem, which tells us that if a bunch of random variables go into determining the values of yet another variable, then the values of that variable will approximate a normal distribution. The…
So in the last post, we talked about the normal distribution, and at the very end, discussed that if you knew the mean and standard deviation of a population for a particular variable, than you can compute the probabilities associated with a particular value of that variable within that population…
So the other day, I was talking to someone about one of the studies I was planning on posting about. I mentioned one of the results, and he said he'd really like to see the means and standard deviations. I thought to myself, "Alright, I'll put those in the post," but when I actually started writing…

That is, the range within which we are 95% certain the population mean occurs (put another way, if we took a whole lot of samples from the population, about 95% of those samples would have means in that range).

I think this is doubly wrong, unfortunately. First of all, the correct frequentist interpretation is that 95% of confidence intervals calculated in this way will contain the population mean, not that 95% of all samples will have means in the range of the confidence interval you calculated from the first sample.

Secondly, you're conflating the (incorrect) frequentist interpretation with personal certitude too glibly. You can't get a Bayesian interpretation out of a frequentist technique so easily. If you want personal confidence, you have to start with personal priors.

Thanks for continuing the series. I'm sure it's very helpful for many readers. I really don't mean to keep griping, and I think you're doing a great job. I figure it's my role as a statistician / commenter to try to clear up some misconceptions, though.

correct frequentist interpretation

"Correct" and "frequentist" aren't words I normally associated, being a Bayesian myself. ;) Nonetheless, the comment is correct, the interpretation in the post is wrong. "Confidence" in frequentist inference does not apply to parameters because parameters are not random variables, they are constants.

There are some really great frequentist tools (I do so love maximum likelihood estimation found in programs like Stata and SAS) but the interpretation of things like frequentist confidence intervals is just... counter-intuitive.

You can't get a Bayesian interpretation out of a frequentist technique so easily. If you want personal confidence, you have to start with personal priors.

The frequentist CI corresponds to a Bayesian CI with an (improper) "flat" prior that leads to a proper posterior. Not the interpretation of it, of course, but the calculation. When the sample size is anywhere near respectable and you have a proper non-informative prior such as Jeffreys' prior, the data will dominate and you won't be able to tell the difference. (Note: "Proper" is a technical term that refers to whether the prior is a true density. There is no uniform distribution over the entire real line, which is why the flat prior is improper. It involves some relatively simple calculus but isn't all that important.)

Ironically enough, a lot of the research in Bayesian statistics is on picking non-informative priors. I took Bayesian stats in grad school from someone who wasn't quite convinced and he spent a long time forcing us to find non-informative priors. In the area I research (psychometrics) informative priors have been common for decades in the guise of penalty terms in likelihood estimation so they don't really bother me much.