In this column, Richard Muller claims that McKitrick and McIntyre have shown that the hockey stick graph is an “artifact of poor mathematics”. If you have been following the global warming debate this claim should look familiar, because McKitrick and McIntyre made the same claim last year as well. So what’s new? Well, last year they claimed that the hockey stick was the product “collation errors, unjustifiable truncations of extrapolation of source data, obsolete data, geographical location errors, incorrect calculations of principal components, and other quality control defects.” Now they are saying that the hockey stick is the product of improper normalization of the data. This is an improvement on their previous claims, since it seems that it will be reasonably simple to test. William Connolley has looked at the data and thinks M&M are probably wrong:
But (having read their paper) I now think I understand what they think the problem is (aside: they complain about data issues with some series but I think this is beside the point: the main point they are talking about is below), and I think that they are probably wrong, based on reading MBH’s Fortran (aside: Fortran is a terrible language for doing this stuff, they should use a vector language like IDL). But anyway:
Lets for the moment assume for simplicity that these series run from 1000 (AD) to 1980. MBH want to calibrate them against the instrumental record so they standardise them to 1902–1980. 1902–1980 is the “training period”.
What M&M are saying (and Muller is repeating) is (and I quote): the data
“were first scaled to the 1902-1980 mean and standard deviation, then the PCs were computed using singular value decomposition (SVD) on the transformed data…”
they complain that this means that:
“For stationary series in which the 1902–1980 mean is the same as the 1400–1980 mean, the MBH98 method approximately zero-centers the series. But for those series where the 1902–1980 mean shifts (up or down) away from the 1400–1980 mean, the variance of the shifted series will be inflated.”
This is a plausible idea: if you take 2 series, statistically identical, but when one trends up at the end where the other happens to be flat, and you compute the SD of just the end bit, and then scale the series to this SD, then you would indeed inflate the variance of the up trending series artificially. But hold on a minute… this is odd… why would you scale the series to the SD? You would expect to scale the series by the SD. Which would, in fact, reduce the variance of upwards trending series. And also, you might well think, shouldn’t you take out a linear trend over 1902–1980 before computing the SD?
So we need to look at MBH’s software, not M&M’s description of it. MBH’s software is here, and you can of course read it yourself… Fortran is so easy to read…
What they do is (search down over the reading in data till you get to 9999 continue):
- remove the 1902-1980 mean
- calc the SD over this period
- divide the whole series by this SD, point by point
At this point, the new data are in the situation I described above: datasets that trend upwards at the end have had their variance reduced not increased. But there is more…
- remove the linear trend from the new 1902-1980 series
- compute the SD again for 1902-1980 of the detrended data
- divide the whole series by this SD.
This was exactly what I was expecting to see: remove the linear trend before computing the SD.
Then the SVD type stuff begins. So… what does that all mean? It certainly looks a bit odd, because steps 1–3 appear redundant. The scaling done in 4–6 is all you need. Is the scaling of 1–3 harmful? Not obviously.
Perhaps someone would care to go through and check this. If I haven’t made a mistake then I think M&M’s complaints are unjustified and Nature correct to reject their article.