Deja Hockey Stick

In this column, Richard Muller claims that McKitrick and McIntyre have shown that the hockey stick graph is an “artifact of poor mathematics”. If you have been following the global warming debate this claim should look familiar, because McKitrick and McIntyre made the same claim last year as well. So what’s new? Well, last year they claimed that the hockey stick was the product “collation errors, unjustifiable truncations of extrapolation of source data, obsolete data, geographical location errors, incorrect calculations of principal components, and other quality control defects.” Now they are saying that the hockey stick is the product of improper normalization of the data. This is an improvement on their previous claims, since it seems that it will be reasonably simple to test. William Connolley has looked at the data and thinks M&M are probably wrong:

But (having read their paper) I now think I understand what they think the problem is (aside: they complain about data issues with some series but I think this is beside the point: the main point they are talking about is below), and I think that they are probably wrong, based on reading MBH’s Fortran (aside: Fortran is a terrible language for doing this stuff, they should use a vector language like IDL). But anyway:

Lets for the moment assume for simplicity that these series run from 1000 (AD) to 1980. MBH want to calibrate them against the instrumental record so they standardise them to 1902–1980. 1902–1980 is the “training period”.

What M&M are saying (and Muller is repeating) is (and I quote): the data

“were first scaled to the 1902-1980 mean and standard deviation, then the PCs were computed using singular value decomposition (SVD) on the transformed data…”

they complain that this means that:

“For stationary series in which the 1902–1980 mean is the same as the 1400–1980 mean, the MBH98 method approximately zero-centers the series. But for those series where the 1902–1980 mean shifts (up or down) away from the 1400–1980 mean, the variance of the shifted series will be inflated.”

This is a plausible idea: if you take 2 series, statistically identical, but when one trends up at the end where the other happens to be flat, and you compute the SD of just the end bit, and then scale the series to this SD, then you would indeed inflate the variance of the up trending series artificially. But hold on a minute… this is odd… why would you scale the series to the SD? You would expect to scale the series by the SD. Which would, in fact, reduce the variance of upwards trending series. And also, you might well think, shouldn’t you take out a linear trend over 1902–1980 before computing the SD?

So we need to look at MBH’s software, not M&M’s description of it. MBH’s software is here, and you can of course read it yourself… Fortran is so easy to read…

What they do is (search down over the reading in data till you get to 9999 continue):

  1. remove the 1902-1980 mean
  2. calc the SD over this period
  3. divide the whole series by this SD, point by point

At this point, the new data are in the situation I described above: datasets that trend upwards at the end have had their variance reduced not increased. But there is more…

  1. remove the linear trend from the new 1902-1980 series
  2. compute the SD again for 1902-1980 of the detrended data
  3. divide the whole series by this SD.

This was exactly what I was expecting to see: remove the linear trend before computing the SD.

Then the SVD type stuff begins. So… what does that all mean? It certainly looks a bit odd, because steps 1–3 appear redundant. The scaling done in 4–6 is all you need. Is the scaling of 1–3 harmful? Not obviously.

Perhaps someone would care to go through and check this. If I haven’t made a mistake then I think M&M’s complaints are unjustified and Nature correct to reject their article.

My previous experience with McKitrick gives me no confidence in his work. David Appell is also sceptical of this latest attack on the hockey stick.


  1. #1 ben
    October 18, 2004

    What about this bit by some guy named “von Storch” in which it is argued that the historical data is not compatible with the modern data, and thus the hockey stick is simply a result of comparing apples with oranges?

    Seems fair to me, if this is actually the case, that the tree-ring data (which I presume is what the historical results are based on) would have a natural damping effect, while the modern instrument data of course has no damping. I’m not entirely sure where any of the data comes from, just playing devil’s advocate here.

    Now, if this is indeed the case, the modern data could be re-computed with a damping term added, something that guesses at the damping from the tree-ring data.

    What was that thing about FORTRAN? I personally dislike FORTRAN (in the same way I dislike red cars) but how is it possible that it is a bad language for doing these sorts of calculations? Isn’t the vector stuff superior for computationaly intensive stuff like massive CFD simulations and what not, due to additional algorithm efficiency when the data is in vector form? I don’t see simple data reduction falling into this category.

  2. #2 ben
    October 18, 2004

    sorry, here’s the von Storch link.

  3. #3 Charles Stewart
    October 18, 2004

    I personally dislike FORTRAN (in the same way I dislike red cars) but how is it possible that it is a bad language for doing these sorts of calculations?

    It’s perfectly efficient, but the problem is that FORTRAN reflects worse the concepts that scientists use in reasoning about and talking about these sorts of things, and therefore there is hard work involved in checking that a FORTRAN program does the calculation the author’s claim it does. William Conolly is suggesting this is exactly the problem here.

  4. #4 Tom
    October 18, 2004

    “What about this bit by some guy named “von Storch” in which it is argued that the historical data is not compatible with the modern data, and thus the hockey stick is simply a result of comparing apples with oranges?”

    Von Storch, IIRC, was the former editor of ‘Climate Research’, who resigned (with several others of the editorial board) after Soon & Baliunas’ infamous paper was published, so he’s one of the good guys.

  5. #5 Yelling
    October 18, 2004

    Ben: I haven’t been able to read von Storch yet, but I feel that he has shown himself to be a conscientious scientist (in the Climate Research kafuffle a while back). Of course his work was based on climate models so anyone who accepts it implicitly accepts the accuracy of the climate models which renders Mann’s paper of little consequence to the current debate (i.e. if the climate models are accurate then lets use them and get on with it).

    In regards to M&M’s recent work, a quick read through the reviews by the Nature reviewers shows that it received a fairly cool welcome and in fact I don’t think anyone really said it should be published.

  6. #6 Brett Bellmore
    October 19, 2004

    Isn’t the simplest test of their claims, and the simplest way to “read” the program, to run it, and supply it with random data? Either it will produce a “hockey stick” out of data that doesn’t have one, or it won’t. End of contraversy.

  7. #7 Louis Hissink
    October 20, 2004

    While it might be possible to fit a linear trend to the data, it is, none the less, statistical masturbation.

  8. #8 Eli Rabett
    October 21, 2004

    No Brett, you have to run it many times with many sets of random data to have any confidence

  9. #9 William Connolley
    October 27, 2004

    The von Storch paper is very good: very thought provoking. Its also not easy to understand: you have to work at it. I’m not finished yet. I strongly recommend going to read the original article, not’s misrepresentation.

    The von S paper is based on using ECHO-G, a climate model (you remember, the things that skeptics consider fatally flawed…) to generate “psuedoproxies”, from which the original (known, in the case of the model) climate of the last 1000 years is reconstructed. They then play with these a bit to find out how well the reconstruction goes. And they discover that the proxies tend to underestimate the long-term variance.

    Interestingly, ECHO-G (with the forcings they use) thinks that today is the warmest in the last 1000 years…

    I think the original is available from

    Or, if you have a subscription:

  10. #10 Ken Miles
    October 27, 2004

    I agree with William, the von Storch paper (which I’m still working through) is very interesting, and is also the first decent criticism which I’ve seen on the “Hockey Stick”.