I have a bone to pick with The Weather Channel, and it has to do with misuse of statistics. This is something I noticed a long time ago, so it’s about time I said something about it. The problem here is fairly obvious, so I’m sure many others have noticed this before. Also, this may not be specific to The Weather Channel, but I’m just using it as an example because that is where I have observed this problem.
To the left, you can see tomorrow’s forecast for Austin, Texas, from The Weather Channel. The key piece of information here (for this post) is that there is a 40% chance of precipitation. There is a little bit of ambiguity as to what exactly this means, but I interpret this as saying that there is a 40% chance that at some point tomorrow it will rain. Since The Weather Channel gives forecasts for both the day and the night, I’m going to assume that this forecast only pertains to daylight hours.
That’s all well and good, but below I have pasted the hourly forecast for tomorrow. Can you spot the problem?
(Disclaimer: the image above has been pieced together from multiple images, because all of the relevant hours did not appear on the same webpage.)
Let’s start from the beginning. The Weather Channel gives a 40% chance of precipitation at 8 am. Once again, this is a bit ambiguous, but since the forecast is given hourly, I’m assuming that this means there is a 40% chance that it will rain at some point during the hour following 8 am. (I acknowledge that this is just an assumption, but the meaning must be something similar. Surely this forecast doesn’t mean that there will be a 40% chance of precipitation at 8 am exactly, because that would be even more ludicrous than the problem I describe below.).
The problem starts when we get to 9 am, where there is also a 40% chance of precipitation. Since we know that the probability of rain in the hour following 8 am (P8a) is 0.4 and that the probability of rain in the hour following 9 am (P9a) is also 0.4, then we can calculate the probability that it will rain at some point during this two-hour period (update: this is assuming that each hour is independent–an assumption that has some limitations, as noted below):
P8a,9a = P8a + P9a – P8aP9a
P8a,9a = 0.4 + 0.4 – (0.4 x 0.4) = 0.64
Just to explain what I’ve done here, I’ve added the probability of rain in the first hour and rain in the second hour, and I have subtracted the overlapping probability of rain in both hours. Alternatively, we could calculate the probability that it doesn’t rain in either hour and subtract it from one:
P8a,9a = 1 – ((1 – P8a) x (1- P9a))
P8a,9a = 1 – ((1 – 0.4) x (1 – 0.4)) = 0.64
Either way you do the math, there should be a 64% chance that it will rain at some point in that two-hour period. This is a problem, because according to the daily forecast, there’s only a 40% chance that it will rain at some point over the whole day.
This discrepancy is only magnified if we expand our calculation to the full day:
Pday = 1 – ((1 – P8a) x (1- P9a) x … x (1- P7p)
Pday = 1 ((1 – 0.4) x (1 – 0.4) x … x (1 – 0.3) = 0.99
Based on the hourly forecast, there’s a 99% chance that it will rain at some point tomorrow! That’s a far cry from the 40% chance given in the daily forecast. For there to be a 40% chance of rain over the course of the day, there could only be about a 4% chance of rain during each hour. Or, alternatively, you could have a 40% chance for one hour, and a 0% chance for all of the other hours. Or, you could have some scenario in between. However, you cannot have a 30-40% chance of rain for each hour period and still only have a 40% chance of rain for the whole day.
I can think of a few possible explanations for this discrepancy. One would be that the people at The Weather Channel have no understanding of basic statistics. I find this explanation hard to believe, although not impossible. A second explanation would be that a 40% chance of precipitation does not in fact mean that there is a 40% probability that it will rain at some point over that period of time, but instead has some more opaque meaning. A third explanation, similar to the second, would be that The Weather Channel applies some correction factor to the hourly probabilities to increase them to numbers that people are more used to.
My intuition tells me that the correct explanation is probably the second or the third. But, if this is the case, what specifically do these probabilities mean? Maybe I’m naive, but this practice seems misleading to me, and, regardless, the correct meaning of these statistics should be spelled out on the website. Is this common practice in meteorology, or just specific to The Weather Channel? I don’t know, but maybe someone can explain this whole phenomenon to us.
Update: Check out the comments below for an informative discussion on the topic. One thing that came up in that discussion that I will note here is that I am making the calculations above assuming that each hour is independent of every other hour. In reality, however, each hour isn’t totally independent. If it rains one hour, it’s more likely during the next hour that it will also rain (i.e. there is a high probability that the rain will continue from one hour to the next). This dependence will not be as strong for the hour after that, and times separated by a few hours will be virtually independent. Thus, the fact that the weather at various times is not totally independent of the weather at other times will increase the complexity of the calculations performed above. This will have the effect of decreasing the calculated probabilities, but they will still be higher than the probabilities given in the daily forecast.