McIntyre’s irrational demands

In a comment to post on the Barton letters, Ed Snack claimed that

Michael Mann made an error in MBH98, he confused the square root of the cosine of the latitude with the cosine

Now if you look at MBH98, cosine latitude is only mentioned here:

Northern Hemisphere (NH) and global (GLB) mean temperature are estimated as areally-weighted (ie, cosine latitude) averages over the Northern hemisphere and global domains respectively

I did a bit of searching and found that Snack’s source is this
statement in the supplementary material for von Storch at al’s paper
“Reconstructing Past Climate from Noisy Data” DOI:
10.1126/science.1096109:

Our implementation of the MBH method essentially follows their
description in their original paper (S17). The statistical model was
calibrated in the period 1900-1980. Monthly near surface-temperature
anomalies were standardized and subjected to an Empirical Orthogonal
Function Analysis, in which each grid point was weighted by (cos
φ)^(1/2), where φ is the latitude (Mann et al. 1998
erroneously use a cos φ weighting).

But the area of the grid cells that MBH use is proportional to cosine
latitude and not to the square root of cosine latitude so I posted a
comment
suggesting that von Storch was mistaken.

Steve McIntyre then pounced on
my comment, presenting evidence that von Storch was correct. He even
stated that my comment was more worthy of criticism than McKitrick’s
mixing up of degrees with
radians
in a journal paper
touted as a bombshell that refuted global warming.

It seems that if you want the output from PCA to be weighted by area,
the input has to be weighted by the square root of area. I don’t know
enough about PCA to know for sure who is correct here, but certainly
von Storch’s criticism has not been refuted, so I
retracted
my comment.

Neither von Storch nor McIntyre seem to think that the weighting issue
is very important. Von Storch just mentions it in passing and McIntyre
as not bothered to find out what effect it has on the final
reconstruction.

Nonetheless McIntyre repeatedly demanded that I post a ferocious
denunciation of Mann’s weighting error. He felt that I was obliged to
do this because my single post on McKitrick’s mixing up degrees with
radians when calculating the cosine of latitude meant that I
specialized in cos latitude problems. Now his demand is rather
irrational. Firstly, “cos latitude problems” is a gerrymandered
category engineered to create a false equivalence between McKitrick’s
error of using degrees when he should have used radians in a linear
regression and Mann’s error of not taking the square root of his
weights in a Empirical Orthogonal Function Analysis. Secondly, one
post out of almost 800 on this blog does not make me a specialist on
that topic. Thirdly, even on a topic where I do specialize, like,
umm, Lott, I still don’t have to post on every little move Lott makes.

I explained this to McIntyre, but he insisted that I was this strange
“cos latitude specialist” thing. I don’t think he was doing it to annoy me—he seemed to have completely convinced himself. He then felt entitled to
deliver a stream of jibes and insults, accusing me of hypocrisy, of
being petulant and of being a troll. He does this to others as well,
calling Gavin Schmidt and
Caspar Amman “Dumb and
Dumber”

He also falsely claimed that I attributed McKitrick’s degrees/radians
mix up to McKitrick and McIntyre and falsely claimed that my
criticism of Essex and
McKitrick
was “mostly just
belligerence”. Nor would he correct these falsehoods.

If McIntyre’s dealings with climate scientists have been anything like
his behaviour towards me, with his irrational demands and unpleasant
manner, I can certainly understand why they might not wish to
correspond with him.

Comments

  1. #1 Steve McIntyre
    August 22, 2005

    Both R2 and r2 are accepted usages: R2 is the more common usage in econometrics and statistics, while paleoclimatologists and dendrochronologists tend to use r2.

    The Figure in MBH98 is for the AD1820 roster with 112 “proxies”, incluing 11 actual temperature series. The cross-validation R2 for this step was quite high as indicated in this figure. Mann obviously had no reluctance to disclose and even feature the R2 statistic in a step when it was favorable for his reconstruction.

    The dispute is over the earlier reconstruction, and, in particular, the 15th century step in the stepwise reconstruction with only 22 proxies and no instrumental series as input. Mann has never released a digital version of this step and has refused to disclose the digital reconstruction of this step. Both my emulation and my run-through of the Wahl-Ammann emulation for this step show that the cross-validation R2 of the AD1400 step is ~0.0. The source code shows that Mann calculated this, but did not report it. The absence in the Supplementary Information is quite striking.

    In our GRL article, we showed that simulations on red noise using MBH98 methods led to hockey stick shaped PC1s which generated high RE statistics and R2 of ~0.0 when fitted against NH temperature – a pattern identical to that of the actual MBH98 reconstruction using the 15th century proxy network. Hence the conclusion that this network has no statistical skill.

  2. #2 Steve McIntyre
    August 22, 2005

    Re #96: Dano, you said: “Well, we know Mann claims MBH didn’t use r^2, but hey, let’s not construct strawman arguments, right boys?”

    How do you reconcile Mann’s statement with the Figure in Nature showing r2 statistics?

  3. #3 dsquared
    August 23, 2005

    Michael, you’re also using really quite confusing language about R^2. You write:

    The math failure. “The r2 value is not a statistical test”. Umm. Actually, that’s exactly what it is. It tests for significance. If the point you wanted make was that the r2 value was a sufficent condition for rejecting the null H, but not a neccesary one then say so. But don’t deny that it’s a statistical test. That’s just silly.

    R^2 is a measure of goodness of fit. That’s not the same thing as a “statistical test”, if that phrase is to have its ordinary meaning. Proof; if it were a statistical test, then there would be tables of critical values of R^2 so that people could check whether their estimates passed an R^2 test. There are no such tables because there is no such test. R^2 doesn’t “test for significance” and as I keep pointing out, it is possible to have a correct model with highly significant coefficients which has a low R^2 because it is a model of a noisy process.

    Stephen McIntyre: I’ve asked this before but I guess it got lost in the thread; could you give the actual number for the R^2 figure in your tests rather than saying “~0.0″, as zero is quite obviously a special number in this case. Also, it strikes me that the R^2 from an exercise like the one you describe would be a more or less meaningless transform of the variance of your red noise. In particular, in such an exercise it strikes me that a zero R^2 is nowhere near the worst that things could get; it is not uncommon in econometric applications to get an out-of-sample fit with a negative R^2 (ie a mean-squared forecast error which is greater than the variance of the data)

  4. #4 John Quiggin
    August 23, 2005

    A couple of asides:

    1. Economists also use adjusted R2 measures ( Rbar^2), for which the critical value is zero.

    2. Economists got used to high R2 values in macroeconomics because most of the early analysis involved time series with a common trend, but other analyses on large data sets from household surveys and so on produce R2 close to zero, even though coefficients of interest are highly significant. I imagine the climate series are closer to the latter case.

  5. #5 ÐanØ
    August 23, 2005

    Dano, you said: “Well, we know Mann claims MBH didn’t use r^2, but hey, let’s not construct strawman arguments, right boys?”

    How do you reconcile Mann’s statement with the Figure in Nature showing r2 statistics?

    A. Steve, I was trying to nail down per on anything so I could show that he either hadn’t read the Nature or was…er…misrepresenting the paper (it takes a while, as I found out on sci.env, to figure out a tactic to this end).

    I should have done a better job at constructing that sentence, since I knew MBH98 used r2 in the paper. Apologies for the confusion.

    B. That said, I don’t know the answer to your question. I suspect the initial work was tested with r2 and sometime later they learned RE was a better test.

    As you no doubt are keenly aware, the MBH98 was an early work. Subsequent work has improved upon it (as is often the case) yet the overall conclusion still stands, as the evidence from later papers indicates. Certainly the error bars were useful in the depiction, as some of the MBH98 uncertainty looks pretty good in later depictions.

    Knowledge marches on!

    Best, sir,

    an

  6. #6 ÐanØ
    August 23, 2005

    Both R2 and r2 are accepted usages: R2 is the more common usage in econometrics and statistics, while paleoclimatologists and dendrochronologists tend to use r2.

    How quickly one forgets. You are correct Steve.

    My training is in the natural sciences, which uses r2, and we see r2 all the time. I also see I used R2 in an Urban Econ class not that long ago calendrically, but ages ago mentally.

    BTW, your glacier discussion in the comments is fascinating.

    Best,

    an

  7. #7 dsquared
    August 23, 2005

    I hesitate to quibble with Prof. Quiggin but I have elected myself as the “R^2 Terminology Police” here so I have to keep insisting that R-bar-squared, like R-squared, doesn’t have “critical values”; the value of zero is an important one for R-bar-squared because a value greater than zero indicates that there is at least some genuine fit to the model over and above what you would get arithmetically by eating up degrees of freedom, but it isn’t a “critical value” in the sense in which 1.96 is a critical value for a t-ratio. JQ’s substantive points remain.

  8. #8 dsquared
    August 23, 2005

    I yield to no man, btw, not even Tim, in my obsessive nerdish Lambert completism. I’ve just remembered something …

    Here’s an example of how “making a fetish out of r-squared” can lead you up the garden path (note that Tim is gently mocking Lott here with the tables; it is not a serious critique of Lott)

  9. #9 Steve McIntyre
    August 23, 2005

    dsqaured – MBH98 certainly used r2 (OK, I’ll use this form here instead of R2), as a test of statistical significance. MBH98 stated:

    “For the r2 statistic, statistically insignificant values (or any gridpoints with unphysical values of correlation r , 0) are indicated in grey. The colour scale indicates values significant at the 90% (yellow), 99% (light red) and 99.9% (dark red) levels (these significance levels are slightly higher for the calibration statistics which are based on a longer period of time).” The color code shows that they treated 0.06 as being 90% significant; 0.14 as 95% and 0.20 as 99% significant.

    Their usage of the term statistical significance follows dendrochronological practices in Cook et al [1994] and Fritts [1976; 1990], and is a little different than one’s used to in general statistical literature, but can be followed. They use RE in a similar way. What’s unique about MBH98 is the selectivity of the reporting of verification statistics.

    dsquared: we reported the following values of the verification statistics standard in the trade for our emulation of the 15th century step of MBH98 (additional to the RE statistic) – R2: 0.02; CE: -0.26; Sign Test: 22/48; PM Test: 1.54. I’ve modified the emulation a little since then to adjust a scaling step after seeing Wahl-Ammann code (where we were virtually identical in construction of RPCs. The verification statistics for my run-through of the Wahl-Ammann version were: R2: 0.02; CE: -0.24; Sign Test: 0.54; PM: 0.91.

    As noted above, the statistical terminology of MBH98 and dendrochronological literature is a little idiosyncratic and I’ll try to post a note up at climateaudit on reconciling it to more usual statistical terminolgy of null hypotheses, with a view to discussing exactly what the null hypothesis is.

    BTW I’ve discussed the MBH98 confidence intervals at climateaudit a few months ago. There’s a lot of hair on these calculations as well.

  10. #10 Dano
    August 23, 2005

    Hey Steve, can you kill the spam filter on my IP addy in your comments?

    For some reason, it flagged me. No idea why. I’m being overly considerate to the PosseTM today, since it’s your house an’ all.

    Thanks!

    D

  11. #11 Steve McIntyre
    August 23, 2005

    Interesting way to communicate. The Spam Karma tends to penalize new posters. I’ll hceck on it. I agree that you’ve been quite civil while you’ve been visiting. I appreciate the compliment about the glacier posts.

  12. #12 Dano
    August 23, 2005

    Still no good, Steve, at ~0045GMT.

    Off on another vacation, cheers,

    D

  13. #13 John McCall
    August 25, 2005

    re: 92 Ely Rabett

    “it does not matter if a forcing is natural, supernatural or a result of people’s actions.”

    Of course it matters, the first two forcings are largely unaddressable save questionable actions such as blocking solar radiance before it reaches earth. In addition, if AGW forcings are minimal in contribution, then an already minimally effective AGW-reversing initiative like Kyoto “would be less effective than thought.”

    This brings me to your second point, my preference toward Moberg’s reconstruction or von Storch’s modeling isn’t the issue ? it’s Esper J, Wilson RJS, Frank DC, Moberg A, Wanner H, Luterbacher J (in press) Climate: past ranges and future changes. Quaternary Science Reviews.

    They are the ones calling for updating the proxies (adding to Steve McIntyre’s voice), among other things to address the “amplitude puzzle” of recent (including MBH’99) reconstructions:

    “data from the most recent decades, absent in many regional proxy records, limits the calibration period length and hinders tests of the behaviour of the proxies under the present ‘extreme’ temperature conditions. Calibration including the exceptional conditions since the 1990s would, however, be necessary to estimate the robustness of a reconstruction during earlier warm episodes, such as the Medieval Warm Period, and would avoid the need to splice proxy and instrumental records together to derive conclusions about recent warmth.

    So, what would it mean, if the reconstructions indicate a larger (Esper et al., 2002; Pollack and Smerdon, 2004; Moberg et al., 2005) or smaller (Jones et al., 1998; Mann et al., 1999) temperature amplitude? We suggest that the former situation, i.e. enhanced variability during pre-industrial times, would result in a redistribution of weight towards the role of natural factors in forcing temperature changes, thereby relatively devaluing the impact of anthropogenic emissions and affecting future predicted scenarios. If that turns out to be the case, agreements such as the Kyoto protocol that intend to reduce emissions of anthropogenic greenhouse gases, would be less effective than thought. This scenario, however, does not question the general mechanism established within the protocol, which we believe is a breakthrough.”

    While we’re waiting for AGW proponents to retire MBG’99 from the Wikipedia plot you all are so fond of posting, we can update the proxy records so the training period is enhanced up to and including the “warmest decade in the millennium?”

    Oh, and I’ve checked RealClimate and ClimateAudit – doesn’t appear that consideration has been given to this Jul’05 accepted paper yet — perhaps if/when it’s published? Scott Church seems to have insight into the latest RealClimate positions – maybe he knows if the paper is under consideration of their blog?

  14. #14 John McCall
    August 25, 2005

    Oh and Dan0 —

    You should read the paper as well. You and our learned host are so fond of throwing out and (IMO) misreading the millennium relevance of that Wikipedia reconstruction summary; and that includes the last 10-35 years of those proxy vs. instrument record of the plots. The paper will give some things to think about “splicing” …

  15. #15 cytochrome sea
    August 26, 2005

    See the Huybers comment about the dendroclimatic proxy correlation to local temperature measurements. (he was referencing a ?Jones? paper iirc)

  16. #16 cytochrome sea
    August 26, 2005

    Wait; I reread it, scratch that last post.

  17. #17 Steve Bloom
    August 26, 2005

    Where is this paper available?

  18. #18 Bob
    August 26, 2005

    RE #113

    The article was published 8/10/05 on Science Direct.

    “Persisting controversy (Regalado, 2005) surrounding a pioneering northern hemisphere temperature reconstruction (Mann et al., 1999) indicates the importance of such records to understand our changing climate. Such reconstructions, combining data from tree rings, documentary evidence and other proxy sources are key to evaluate natural forcing mechanisms, such as the sun’s irradiance or volcanic eruptions, along with those from the widespread release of anthropogenic greenhouse gases since about 1850 during the industrial (and instrumental) period. We here demonstrate that our understanding of the shape of long-term climate fluctuations is better than commonly perceived, but that the absolute amplitude of temperature variations is poorly understood. ”

    “When matching existing temperature reconstructions (Jones et al., 1999; Mann et al., 1999; Briffa, 2000; Esper et al., 2002; Moberg, et al., 2005) over the past 1000 years, although substantial divergences exist during certain periods, the timeseries display a reasonably coherent picture of major climatic episodes”

    I guess I don’t understand your bluster. The article argues proxy reconstructions are good (including MBH) but could be refined. Not really big news.

  19. #19 JohnMcCall
    August 27, 2005

    re: 118 Oh you mean, in addition to difficulty reading/understanding the summary quotation cited in 113, you also didn’t grasp what the quote, “although substantial divergences exist during certain periods” meant from your own citing?

    TRANSLATION:

    “substantial divergences … certain periods” = Mann’99 during MWP. Please study Figure 1, just estimate what happens if Mann’99 is dropped from the averaging. Even though it’s the steepest at the business end of the hockey stick, dropping Mann’99 would have less affect on that end (1990s instrument data warming); but the MWP proxy period average would rise significantly. Because of it’s icon-status, Esper and Moberg (like von Storch before them) realize the hockey stick must be included (even if it’s “rubbish”) – and it’s inclusion sure dampens the average amplitude of the MWP end of the millennium, while making the modern end steeper?

  20. #20 Bob
    August 27, 2005

    I understand the paper well, but then I am not the one making irrational demands of it or its authors. Nor am I the one using my magical mystery powers to read deeply into & “translate” a mundane paper in an obscure journal.

    The paper is not introducing the theory of relativity, it summarizes the current lit and sets out a generalized research agenda. Yawn.

  21. #21 JohnMcCall
    August 31, 2005

    I “demand” nothing — it’s obvious for those who read and understand, that the summary statement is the more definitive of the citations.

    Had you taken some time to analyze figure 1 before you posted your quote, you would have seen the clear reference to Mann’99. There are others in the figure, but your intellectual laziness drove you to swing wildly, posting of “demands” and “magic,” rather than drawing on modest observation and insight to post along that line.

    But you’re right about one thing (although you didn’t go far enough); as far as you’re concerned, everything about the article is obscure.

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.