There is an interesting new post up at KlimaZweibel about a paper by Smerdon et al.. This is going to be all over everywhere very soon, so I may as well jump in.
The title, of course, is a snark at RC; see the article A Mistake with Repercussions which points out some errors in a Zorita and Von Storch paper (they got their model setup wrong). [I've just snarked them in their comments; it will be intersting to see if it stays]
In this case the problem is rather more arcane, but worth explaining, so let me do that first.
[Update: no, let me first point out that there is a response by Rutherford et al. which appears to say that they fixed all these problems ages ago.]
[And second, let me recommend that instead of reading about yet another minor fuss, you read the lovely post about DLR by SoD.]
Suppose you have created a spiffy new method for reconstructing past climate from proxy data, intended to be used over the past 1000-2000 years or whenever. You can try to test that on real-world data, of course, but you run into an immeadiate problem: you don’t know in advance what the right answer is, so you’ll never know how good your method is.
The obvious solution to this problem is to use fake data (OMG I’ve used the word “fake”! Scandal! Pushing it a bit, I could even call this a “Trick” Don’t tell the fools). One approach would be to use simple random data, but this then runs into another problem: the statistics of the random data won’t look anything like the real-world data. So, a far better solution is to take climate model data and use that as your testbed. In this case, you take a long integration of one or more climate models. This is great: the model looks something like the real world (though it doesn’t have to be desperately realistic, and it doesn’t have to have tracked the actual yearly or decadal rise and fall of real-world temperatures), and you now do know the Right Answer: viz, the mean annual temperature (or the hemispheric mean, or whatever it is you care about), because since you have the full global model output you can trivially calculate it.
Then, you take the know locations of your proxies (and you can even include the changing number of locations over time, or model the effects of including more or less) and you interpolate the model data to these locations. And you can even add a carefully-calibrated amount of noise to the interpolated value, to mimic the proxy not providing a true temperature. And then you run this data through your method, and you then compare it to the Right Answer.
What Smerdon et al. say is that Mann et al. have made various errors in their handling of the model data used to test the reconstruction methods: they have got the smoothing wrong, or they have switched locations by 180 degrees. That doesn’t invalidate the methods (a point I’m sure will get lost in the blogonoise) but (if correct) would invalidate the testing.
Add1: so, the point about the test-method only needing to work on data that has some kind of bearing on the real world is the explanation for
As Smerdon et al. correctly point out, this error does not impact the qualitative conclusions drawn from the results and described in Mann et al., 2007a (cf. Figure 1). The global field was still reasonably sampled, and the pseudoproxy locations, while not correct in longitude, are correct in latitude, and reasonably sample the field. It should also be noted that real proxy locations can vary considerably based on various inclusion/exclusion metrics that accept or reject proxies when building an actual proxy network. In fact, our network “D” in Mann et al., 2007a actually used random pseudoproxy locations.
in the R et al. reply. Even if you get all the locations wrong by 180 degrees longitude, your test of the method is still probably a reasonable test (note: that is, as a test of the method. Remember that is what all this is about) because the climate is moderately symmetrical wrt change of longitude (but isn’t wrt change of latitude, obviously).
Add2: there is more that a suggestion that there may be a certain amount of academic point-scoring going on here. Rutherford et al. conclude
In summary, the issues raised by Smerdon et al. (2010), while factual, have no material impact on any of the key conclusions of Mann et al. (2007a). Additionally, they have no impact whatsoever on subsequent studies by us (Mann et al., 2009; Rutherford et al., 2010) where the technical errors they note did not occur, and which reach identical conclusions. In light of these considerations, we are puzzled as to why, given the minor impact the issues raised actually have, the matter wasn’t dealt with in the format of a comment/reply. Alternatively, had Smerdon et al. taken the more collegial route of bringing the issue directly to our attention, we would have acknowledged their contribution in a prompt corrigendum. We feel it unfortunate that neither of these two alternative courses of action were taken.
And indeed that last point has some force. If you find something wrong with someone’s paper, the polite course of action isn’t to rush into print saying “ha ha you’re wrong” but to raise it with the original authors. Of course if you do that then you don’t get an all-important publication point out of it (and a comment counts for less than an article, too). If you raise it with the authors and they blow you off then off course you can go into print.
Add3: probably more important than this, is to look at fig 5(a) from Smerdon et al. What strikes you there is not the difference between the red and blue lines, but the diffrernece between the red/blue lines and the black line – which is to say, neither the wrongly-sampled nor the correctly sampled reconstruction is doing a good job of reconstructing the variance of the Right Answer. As Rutherford et al say
First, Mann et al., 2005 used the Regularized Expectation Maximization method with Ridge Regression (RegEM-Ridge) as a regularization method. RegEM-Ridge has been shown to suffer from a loss of variance when reconstructing the hemispheric mean (Zwiers and Lee, pers. comm., August 2006; Mann et al., 2007a,b; Smerdon and Kaplan, 2007) which is not the case with RegEM-TTLS (Truncated Total Least Squares). This led Mann et al., 2007 to use the TTLS implementation of RegEM. This being the case, we will confine our comments to Mann et al., 2007a. However, it is important that the reader recognize that Smerdon et al. (2010) used RegEM-Ridge and that their results shown in Figure 5(a) show the expected variance loss of a RegEM-Ridge reconstruction whereas RegEM-TTLS faithfully reconstructs the target series (Figure 1).
And indeed, if you look at their figure 1 you see that the shiny new method does a much better job.
Add4: at KZ, Eduardo said: They also assert that the errors have been corrected in subsequent studies. And yet, Rutherford et al. continue to show the wrong NH mean temperature simulated by the ECHO-G model – compare figure 1a in the manuscript by Rutherford and Figure 5b in Smerdon et al 2010. It is obvious that these error have not be corrected. to which I replied:
Fair point. I asked about this, and the wrong figure was transcribed. Looking, the PDF response has been updated to show the correct figure. Both pix are in fig 1 of the Rutherford et al. reply to Smerdon (http://www.meteo.psu.edu/~mann/Mann/articles/articles.html); it looks like they transcribed the wrong one.
Poissonally I’d prefer it if people kept old versions around to compare to rather than updating; but then again, I don’t do that with the blog, cos the software won’t let me.
Add5: the transcription error is now confirmed by Mann at an RC comment.