BEST is published

By stoat on January 20, 2013.

It looks like the first of the BEST papers is published (webcite): A New Estimate of the Average Earth Surface Land Temperature Spanning 1753 to 2011 (h/t WUWT) - Richard A. Muller, Robert Rohde, Robert Jacobsen, Elizabeth Muller, Saul Perlmutter, Arthur Rosenfeld, Jonathan Wurtele, Donald Groom and Charlotte Wickham. Note the absence of La Curry (she's noticed, though. Note absence of comment on journal quality).

AW has thrown Muller under the bus and is cwuel to the paper, which is almost enough to make me kind, but not quite. The audience duly parrots this back to him, with a few exceptions. "oldfossil" reminds AW of his original words I’m prepared to accept whatever result they produce, even if it proves my premise wrong.

The Watties are criticising the paper for being published in the first issue of a new journal of no known provenance, "Geoinformatics & Geostatistics: An Overview". And I agree to this extent: BEST would not have published there, had they been able to publish in, say, JGR. Just like no-one publishes in E&E until everyone else has rejected them. But I disagree with the reasoning: the work, I think, is perfectly valid. As I said before. But just as I said before, it isn't really all that exciting: its really just a new method for constructing a global time series, which agrees with the previous ones. Even if it turns out to be a really good idea, its still only a method, not a scientific result. That is why they've had a hard time getting it published. This is a bit of a problem for geophysics-types, and there's an EGU journal devoted to this problem for GCM code, GMD, featuring several of our usual suspects. That exists because developing GCM code, too, is just a method and not generally publishable in the normal journals.

One possible "new result" is the extension of the temperature record back to 1750. What is striking about the early record is the massive oscillations with a period of about 50 years, that completely disappear from the record after about 1910. Is that plausible? Maybe: they claim to see a volcanic signature in these events. But they have no explanation for the large positive excursion in ~1770 (aside: this may be why they had trouble getting into a "real" journal: they say stuff like Most dramatic are the large swings in the earliest period. These dips can be explained as... but they don't mention the peaks, which can't be). Another possibility is that these are an artefact of the poor geographical coverage of the early record. Exactly when to start the record isn't clear, but other centers use 1850-to-1880.

The other thing they do (inextricably tied up with the above) is fitting volcanoes+CO2 to the temperature record. This gives them a climate sensitivity (f 3.1 ± 0.3°C, since you ask) and although they say this parameterization is based on an extremely simple linear combination, using only CO2 and no other anthropogenic factors and considering only land temperature changes. As such, we don’t believe it can be used as an explicit constraint on climate sensitivity other than to acknowledge that the rate of warming we observe is broadly consistent with the IPCC estimates of 2-4.5°C warming (for land plus oceans) at doubled CO2 this effective endorsement of IPCC does force the Watties to revile BEST from now on. Previously, I called this stuff "absurdly naive" which on reflection I'd tone down to "naive" but I still think they'd have had a hard time getting it through JGR et al..

I wrote this last night, but didn't save it: While writing this my trawling threw up the odd silence over AW's own draft paper, ludicrously called "game changing by RP Sr.. Well, in a sense the game has changed: RP Sr is now out of the blogging business.

Update: JA is even less impressed than me: Previously submitted to JGR, it has ended up...as the first article in a newly-created fake vanity press "journal". That's a few rungs down from just sticking it on the Arxiv, where (1) it is free (2) at least some people will read it (3) no-one will think you are pretending it's undergone any meaningful peer review. No wonder Curry has pulled her name from it. The surprise is that the others have not.

Noting that, and some of my commenters, I think I should revise my suggestion that it was hard to publish as a "methodological" paper. Perhaps more plausible is that it was hard to get into JGR as such, and the "attribution" element wasn't liked. So perhaps Muller lacked patience to try a lower-ranked journal, threw a wobbly and said "just get me this thing published somewhere where the refs won't be too pesky". Ter be 'onest wiv yer guv, its hard to understand.

Steven Mosher is over at Curry's defending BEST (well, its his post over there, not hers). So we have stuff like:

Kip Hansen> So tell us all please — Why was this paper published in this shockingly obscure, brand-new journal? Was it actually Peer-Reviewed (notice the initial caps please) — was it really sent out in its entirety to at least three world-class respected experts in the necessary fields, let’s say climate and statistics and computer modelling for instance, and thoroughly vetted, revised, etc before publication? Or was it reviewed by a single editor? and if so, whom?

Steven Mosher>

1. Why was it published? The editor and the reviewers thought it was important work and good work.
2. Was it Peer Reviewed. Yes. There were three reviewers. I read the reviews and then checked our final draft to make sure that we addressed the points that we thought needed to be addressed.
3. Was it sent out in it’s entirety? Yes. I prepared the final draft.
4. Was it sent to 3 world class experts in climate and stats? The reviewers identities are not revealed so that I can only infer from their comments. They understood what we were doing and made helpful suggestions. This was in contrast to previous reviewer comments at other journals who seemed to struggle with kriging, so a geostats journal seemed the better fit.

I can assure you from personal first hand knowledge that “Muller didn’t even know about the journal until it was presented as an option.”

So what you're seeing there is SM not-very-subtly dodging the "Why was this paper published in this shockingly obscure, brand-new journal?" However, when pressed again he finally answers:

1. prestige didn’t matter to guys who have nobel prizes already.
2. history? we enjoyed the idea of setting a standard. being first was an honor.
3. Recognition? only seems to matter to skeptics who argued that peer review wasnt important anyway.

Basically, we liked the idea of being judged on the quality of the science by people not tainted by the kind of nonsense we have seen in other places.

These are not believable answers. There's a thoughtful comment by Jim D

The failure to get the BEST paper published in a climate science journal goes against the skeptical view that pro-AGW papers always get published because of inside deals. It is not that simple. This speaks well of the filtering process in those journals and against bias.

Although the follow-up pointing out that Muller has pissed off the world is reasonable too.

Refs

* Yellow journalism
* Scary and funny: fake researcher Peter Uhnemann on OMICS group Editorial Board #JournalSPAM

More like this

I do take issue with the journal: “Geoinformatics & Geostatistics: An Overview". This sounds like a journal that has no expertise in the analysis of climate data. They might be able to check the mathematics (geostatistics), but are no experts on climate and thus on the assumptions used. There are sufficient climatological journals with lower impact factors as JGR and co.

[Can't disagree there. I had a quick check of their editorial board and there is no-one I recognise -W]

It is a typical strategy of climate "sceptics" to publish in non-related journals that do not know the most appropriate reviewers that would be able to do a good review. It is my impression that almost all seriously flawed "sceptics" papers were published that way.

I also do not agree with the idea that methodology is not important and that methodological papers are more difficult to get published. I mainly work on methodological problems and never had problems getting published because of that. Half of scientific progress is due to improvements in methods, I would argue.

[Hmm. OK, perhaps I'm wrong, though I'm sure this is a complaint I've heard before. Could you give a couple of examples? -W]

Pielke Sr is going to spend more time with his publications - I'm sure that'll learn you.

IEHO WUWT has always been RPS's downmarket blog. Kind of like the News of the World and the Times in Murdochland.

Interesting that this paragraph made it into the paper:

'Some of the climate models predict that the diurnal temperature range, that is, the difference between Tmax and Tmin, should decrease due to greenhouse warming. The physics is that greenhouse gases have more impact at night when they absorb infrared and reduce the cooling, and that this effect is larger than the additional daytime warming.'

Several months ago I noted to Zeke Hausfather that this is wrong. Climate models forced with greenhouse gase increases alone don't produce a significant DTR change.

Regarding the climate sensitivity estimate, I'm pretty sure their estimate is for a land-only sensitivity value. No idea how that would translate to land+ocean ECS.

There was a 'year without a summer', 1815 CE, due to a large volcano eruption. So that is one of the downswings. Dunno about the others.

The main things to note is the large error bars at the beginning. So one can easily avoid a high dT and still stay within the grey area.

The slumming as regards the publication venue is a head scratcher.

The basic findings are certainly publishable in a mainstream journal (not necessarily Science or Nature). It's not at all necessary to have a new and groundbreaking result in order to get published; confirming and extending a broadly-known result using a somewhat different method is pretty typical of published papers. One would think they'd be able to publish a short contribution (GRL, say) that would get their major result on the record and give the basic outlines of how they did it. Then they could follow up with other papers that analyze things in more detail. That's how it's often done.

So why did they do this? Was Muller so insistent on getting some of his speculative interpretations published that he was willing to go this far downmarket to do it?

We wonder.

Here are the relevant (and irrelevant) volcano eruptions:
http://en.wikipedia.org/wiki/List_of_large_volcanic_eruptions
which maybe explains the first down turn and definitely the second.

> dips
Just speculating, I wonder if there are paleo proxies for this kind of bounce?

http://www.sciencedaily.com/releases/2010/10/101006094059.htm
Volcano Fuels Massive Phytoplankton Bloom

Oct. 7, 2010 — Advocates for seeding regions of the ocean with iron to combat global warming should be interested in a new study published October 6 in Geophysical Research Letters. A Canada-US team led by University of Victoria oceanographer Dr. Roberta Hamme describes how the 2008 eruption of the Kasatochi volcano in the Aleutian Islands spewed iron-laden ash over a large swath of the North Pacific. The result, says Hamme, was an "ocean productivity event of unprecedented magnitude" -- the largest phytoplankton bloom detected in the region since ocean surface measurements by satellite began in 1997.

"3. Was it sent out in it’s entirety? Yes. I prepared the final draft."

...but I don't see Mosher's name on the paper.

[See Curry's blog, where he explains that, or at least discusses it -W]

"One possible “new result” is the extension of the temperature record back to 1750."

Do we know that they have new data there? In the data released with the first versions of their papers, they didn't.

[I don't know. It may just be that their method is happier to work with the fragmentary data that is available then. Or it may be they are less picky about how much they have. Mercury thermometers were only manufactured in 1724. And early exposures could be poor -W]

I think the Stoat is right that their method is able to use the sparse and bitty early data to produce a result whereas others require lengthier records with better geographical coverage. Doesn't mean the result they produce is right though.

Even if they did have more data their geographical coverage wouldn't be any better, so accuracy can only come through greater confidence in the recorded temperatures in regions that were covered (pretty much just Europe and East Coast of N. America).

Given the huge uncertainties I can't see what use there is in the early record anyway. Temperatures in the 1770s were either about the same as today or 2ºC cooler.

As an aside, I took a peek at their regional result for Antarctica from 1957-2006, given that their kriging method using surface stations is nominally the same as that used by O'Donnell et al. I found a trend of 0.09ºC/Decade, which is half-way between Steig and O'Donnell if I remember correctly.

I am not surprised that they had difficulty in passing review in a mainstream climate journal. I think there are interesting methods (new at least to the global surface temperature field, which is otherwise pretty primitive), a good new source dataset (much of which is also now in Peter Thorne's new dataset, which didn't exist when BEST did most of their work), and - as a result - a fair resulting dataset. I would have rejected it with some minor comments (mainly that some of their analytical conclusions, particularly for the earlier years, are pretty speculative and should be better labelled as such, and also that their source code, although available, was not really runnable - missing some necessary configuration code), but I think a small amount of fixing-up would have made it fine, and even the original draft is better than a lot of stuff that does get printed (including in JGR-A). But there is still a lot of bad feeling about Muller; and I suspect that some of the obvious choices for reviewer were never going to let this through in a month of Sundays. So my second stop would have been at a journal like PLOS One, or maybe one of the open review journals. A shame that they've ended up where they have.

[I'd have encouraged them to try one of the EGU open-review journals. It would have been good all round - we could have seen the reviewers comments (already rumblings Elsewhere about "so just what did the JGR reviewers say") - and it would have fitted in nicely with the "we're all open" type idea -W]

Given Thorne's new data set they had to get their published first no matter where. They also were in a bind wrt to other manuscripts they had which were accepted on the condition that the data set be published.

[That last is a good point, I've seen it said elsewhere they were waiting on that. So perhaps they don't care - this just has to be out, anywhere. Its the other papers they care about -W]

" Another possibility is that these are an artefact [sic] of the poor geographical coverage of the early record."

You think?

Paul S,

Technically that paragraph is correct when it says "some of the climate models...". It was stronger wording than I would have liked but, well, I'm not an author on the paper and just provided advice :-p

The journal certainly isn't ideal, but the underlying work (in large part by Robert Rohde) is solid. Now that the gridded data is up I expect a lot of paleo folks to start using it for proxy calibration, as it extended significantly further back than Hadley data for some locations.

I'm perusing some interesting work with Matt Menne and Claude Williams comparing the Berkeley homogenization results with those of NCDC. The poster was our first glance, but we will probably write something up for a short paper (maybe GRL?) this year: http://berkeleyearth.org/images/agu-2012-poster.png

[Hello, and welcome. "The journal certainly isn't ideal" - no, that's an evasion. I'm sorry, but its obvious something is being hidden here. I'm not sure I care that greatly, but it does taint any claims to openness.

As to the substance: well, people will have to look at and decide of they do think its solid. Whether the sparese early stuff can be relied on is a point that needs addressing -W]

Zeke said:I expect a lot of paleo folks to start using it for proxy calibration, as it extended significantly further back than Hadley data for some locations.

In many cases this will require using a peer reviewed data set, which was one of the things that Eli thought of. There may be a small avalanche of papers waiting on getting the data set published.

On the one hand one might expect a paper with a recent Nobelist co-author to get published easily, but OTOH it's pretty obvious that he was mostly just lending his name as a favor to a colleague, so the piss-offedness remained the key factor.

WC: "Hmm. OK, perhaps I'm wrong, though I'm sure this is a complaint I've heard before. Could you give a couple of examples?"

Well, okay you do not get into Nature or Science with a methodological paper. Those complaints you heard may explain why some methodological papers have one or two plots at the end with some results, which normally do not add much value to the paper.

I could link to almost all my articles, as I said I like methodological problems, let's just link to the list.

The most important methodological paper for this community would be the recent one on a blind validation of homogenization algorithms published in Climate of the Past with an announcement in the Bulletin of the American Meteorological Society.
- A new way to determine a 3D predictor domain for downscaling in the Journal of Climate.
- A downscaling method to couple atmospheric models with higher resolution surface models in Tellus.
- A new downscaling method for 3D cloud in the Quarterly Journal of the Royal Meteorological Society.
- A study on the influence of inhomogeneities on estimates of long range dependence in the Journal of Geophysical Research.
- And finally using a new stochastic modelling approach (surrogate data) to generate 3D cloud fields based on measured statistical cloud properties in Tellus.

Thus just being a methodological paper, is not a sufficient reason to publish this BEST is such a journal. It thus looks as if the reviews of the first journal(s) revealed problems that could not be easily solved and that a climatological reviewer would notice.

One major problem I see is that the global temperature computed is qualitatively different at the beginning and at the ending of the dataset. In the beginning it is mainly the temperature in Europe, in the end it truly is a global temperature. This probably also explains why the variability is so much larger in the first part. Even if the math is right, this makes it a confusing plot to the reader.

Furthermore, looking at the plot, I would expect that the uncertainty estimate is too optimistic and that the real uncertainty is larger. I have not read the paper, but it makes me wonder whether they assumed that they have detected every inhomogeneity and that there are no biases in the trend due to inhomogeneities.

William,

To be honest, I don't actually know how/why that specific journal was chosen. I've only been able to attend two meetings over the last 4 months or so due to work conflicts (climate science, alas, is not my day job), so I'm a bit out of the loop. No evasion intended.

From what I gathered, most of the JGR reviewer complaints were of the nature of not completely understanding / not sufficiently citing prior work done in the literature, something you would rather expect when a group of (somewhat arrogant) physicists go outside their respective fields.

Steve Bloom,

Saul is an active participant in almost all the meetings, not just a "lent name".

How is this so different from:

"Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 data set"

JGR 2012 Morice et al.

http://onlinelibrary.wiley.com/doi/10.1029/2011JD017187/abstract

It is largely methodological, it is in JGR.

As I recall, the HadCRUT series have been supported by a number of published papers. Is it because the original paper for the HadCRUT series was novel and this is an update?.

Alex

[I think you're probably right. See my update (and VV's comment just above). It *is* harder to get methodological stuff in, but I was over-stressing the difficulty. Those authors are all Met Office / CRU people, so they know the ropes. They know what to write, and they know how to talk to referees. And I doubt they have any dodgy attribution in. By contrast, Muller has pissed everyone off and has massively oversold the results in advance -W]

"2. Was it Peer Reviewed. Yes. There were three reviewers. I read the reviews and then checked our final draft to make sure that we addressed the points that we thought needed to be addressed."

Haven't seen any comment yet on this one. Is it because veryone knows its a joke and pulling them up on Peer Review is like shooting fish in a barrel?

Choosing to publish in Volume I No. 1 of this newfangled publication has to be considered in the light of Wegman's new pal reviewed statistical review journal having crashed and burned on account of his group's plagiarism problems.

The overall effet seems to have been to raise the number of science journals Watts has never heard of from ~ 35,000 to ~35,001.

CF:

http://vvattsupwiththat.blogspot.com/2013/01/yellow-journalism.html

[That "v v" works remarkably well as a "w"... -VV"]

See 4th graph in summary that WMC linked to.

1) WMC: "Previously, I called this stuff “absurdly naive” which on reflection I’d tone down to “naive” but I still think they’d have had a hard time getting it through JGR et al.."

Although a fair comment, I actually think this was OK, even if it was behind the state of the art. I often think there is insight to be gained from examining model hierarchies and seeing how good/bad simpler models actually do compared to complex ones that do better. [This arises often in science ... and in computer systems design, where sometimes simpler models are useful enough and run faster.] I certainly prefer IPCC Fig 6.13 (6.14 in printed version)4, not just for the forcings, but for the fuzzy chart that is a far better visualization than a line with gray error zone. As in the various versions of MBH99, some people really know that a line in middle of a large error zone is nowhere near as strong an indicator as that same line inside a small error zone. Many people don't, unfortunately.

2) But, back to 4th figure. While the 1770s upward excursion of the black (reconstruction) line is well within the large error zone, and might be attributable to lack of volcanoes for a few decades (see 6.13 above), BEST's own figure gives a useful comparison, as the 1770s excursion is the biggest departure from the (relatively smooth) CO2 element punctuated by volcanic dips. Of course, if those earlier records are dominate by N.Atlantic/Europe, where snow-albedo feedback works both ways, one would expect wilder swings.

3) it's always useful to check against CO2, as per Lw Dome, 2000 years. and CH4 Sapart etal.

the fuzzy chart that is a far better visualization than a line with gray error zone

Amen -- and the contributing studies put together to make that gray fuzzy chart, shown inside it as colored lines, are also fuzzy uncertainty ranges -- though overlaying those would turn to mud in a static image.

Hank: yes, really good visualizations would seem to be related to volume viz, or approaches discussed at Andrew Gelman's blog., including interesting examples at Solomon Hsiang's website.

OK, Zeke, fair enough. But this subject is very distant from his field, and a reviewer might wonder what he really contributed.

Hi Eli, Zeke (#15, #16),

A few stray thoughts on proxy calibration and BEST:

1. I'd expect the pre-1850s data would be used, if anything, for an extra sort of validation, not calibration, given the uncertainties in those earlier data.

2. As Nick (#10) points out, these aren't newly uncovered data -- most of the early instrumental data used here were available and in GHCN (with some exceptions). BEST's innovation was how they were combined (particularly given gaps and fragments), and now there is a gridded product. That said ...

3. Nearly every station 1800 to 1850, for instance, is in the UK and Europe, with one eastern US station (New Haven, I believe) and a station in India (some of these have large gaps - Manchester for instance looks like it spans 1794 to 1864, then picks up again in the 1950s?). So the pre-1850s BEST composite is largely a European one -- you couldn't do a global proxy calibration/validation against it. You could however do some local or European regional calibration/validation using this set of stations, and now there are gridded data, as Zeke points out, which potentially make an appealing reconstruction target. but some of those gridded data do reflect a single station for some years in North America and Asia.

4. The existing long temperature records in Europe have been used before to look at proxy-climate relationships or do additional validation or look at proxy-climate stability over time, for example Klingbjer and Moberg's 2003 Tornedalen composite (also used by Grudd 2008 and others), the long Alpine temperature records (e.g. Frank et al. 2007), and Edinburgh (e.g. Hughes et al. Nature 308, 341 - 344, 1984).

@Victor Enema: You as an expert who has compared the quality, pros and cons of time-series data (e.g. land station temperature data) homogenizations, do you consider the SNHT method (Alexandersson, 1986) and its modern derivatives (e.g. Menne & Williams, 2007) as the best possible choice for temperature data homogenizations of the GHCN database?

So the guy finds ten times as much gridded data that shows exactly what 1/10 of he data had already made clear; that there was an atmospheric warming spike until 1998, and wonders why he has to publish in an upstart journal when atmospheric temperature has been flat ever since.

Perhaps in another 15 years he can find ten times as much data gridded and corroborating the three satellite groups who have already made it abundantly clear that the atmospheric temperature has been flat ever since?

[Oh dear, another selective reader. Mind you, finding 3 satellite records is impressive. RSS, UAH and... remind me what the third one is, again? -W]

Regarding the OMICS Group, parent of the highly esteemed Geoinformatics and Geostatistics (impact factor: -4.73), the following is informative:

http://phylogenomics.blogspot.com/2012/01/scary-and-funny-functional-re…

Might compare BEST with figure 7 of

The extra-tropical Northern Hemisphere temperature in the last
two millennia: reconstructions of low-frequency variability
B. Christiansen and F. C. Ljungqvist
Clim. Past, 8, 765–786, 2012
www.clim-past.net/8/765/2012/
doi:10.5194/cp-8-765-2012

While the scales are very different I take this as reasonable agreement for the pre-1850 period.

One has to be very careful with comparisons.
The legend for every spaghetti graph (like C&L Fig 6) or equivalent needs to carefully label what the graph c;laims to represent:
1) land or land-ocean
2) What part of the Earth:
SH+NH
NH
NH extratropic (usually 30degN-90, but occasionally 23.5degN)

C&L is 30degN, which is 50% of the NH.
BEST is supposed to be global land.

freddy asked: "...do you consider the SNHT method (Alexandersson, 1986) and its modern derivatives (e.g. Menne & Williams, 2007) as the best possible choice for temperature data homogenizations of the GHCN database?"

SNHT would not be appropriate to homogenize a global dataset such as the GHCN for two reasons. It is a manual method that needs expert supervision and can otherwise produce a mess. And SNHT does improve temperature data, but modern methods do so a lot better.

The pairwise homogenization algorithm (PHA) of NOAA shares the detection part with SNHT, but is otherwise a complete different method. In an European project (HOME) we have benchmarked homogenization methods. PHA was one of the 5 recommended methods and the only one that is able to homogenize a global dataset. Still while PHA was clearly better as SNHT, others were still somewhat better. Hopefully, they will develop their algorithms further so that they can be applied to a global dataset.

For details on this blind validation study, see the linked post at my blog.

The international surface temperature initiative is building an open global temperature dataset and is also organizing a global benchmarking of homogenization algorithms. I hope multiple algorithms will participate.

Prabhakara

[Err, and how would you go about using that to look at the last 15 years? -W]

Kevin, #28, were those the ones used in MBH 98? New Haven at least is familiar.

Refs

More like this

Last warning: mustelid.blogspot.com

Dynamics of Stoats

Gunz: constitutionalism and majoritarianism

That it is easier to agree on economics than morality

Morality and economics

Introducing the Ten Finalists in the 2010 Pie Day Pie Bake-Off...

The Best Children's Books #1

Shooting At The Queen, Torchwood, AI: Cheap Books