Corrections to the McKitrick (2003) Global Average Temperature Series


The graph above, which Iain Murray claimed showed that

"The fact that the ten hottest years happened since 1991 may well be an artifact of the collapse in the number of weather monitoring stations contributing to the global temperature calculations following the fall of communism (see graph)"

comes from this paper by Ross McKitrick. McKitrick recently was in the news for publishing a controversial paper that claimed that an "audit" of the commonly accepted reconstruction of temperatures over the past 1000 years was incorrect, so I thought it would be interesting to "audit" McKitrick's graph.

I should first caution readers that I am not an expert in this area---I'm a computer scientist, not a climatologist. In other words, I'm no better qualified to comment on this than McKitrick. McKitrick writes:

"The main problem in the debate over what the Global Temperature is doing is that there is no such thing as a Global Temperature. Temperature is a continuous field, not a scalar, and there is no physics to guide reducing this field to a scalar, by averaging or any other method. Consequently the common practice of climate measurement is an ad hoc approximation of a non-existent quantity."

This is untrue. Average temperature has a real, physical meaning. For example, if I have one kg of water at 20 degrees and another at 30 degrees, then their average temperature is 25 degrees. This is the temperature I would get if I mixed the water.


McKitrick then reproduces this graph (figure 2) (from GISS), describing it as "NASA's version of this simulacrum". He claims that a decreases in the number of weather stations is "problematic", writing:

"In the early 1990s, the collapse of the Soviet Union and the budget cuts in many OECD economies led to a sudden sharp drop in the number of active weather stations."

However, the graph he reproduces that shows the drop gives a different reason:

"The reasons why the number of stations in GHCN drop off in recent years are because some of GHCN's source datasets are retroactive data compilations (e.g. World Weather Records) and other data were created or exchanged years ago."

I looked at the GHCN data and while the number of weather stations in the former Soviet Union did drop from about 270 to 100, but the total number fell from 5000 to 2700 so the decrease there was only a small factor in the overall decrease.

McKitrick next refers to his figure at the top of this post:

"Figure 3 shows the total number of stations in the GHCN and the raw (arithmetic) average of temperatures for those stations. Notice that at the same time as the number of stations takes a dive (around 1990) the average temperature (red bars) jumps. This is due, at least in part, to the disproportionate loss of stations in remote and rural locations, as opposed to places like airports and urban areas where it gets warmer over time because of the build-up of the urban environment."


I downloaded the raw GHCN temperature data from here, and tried to reproduce McKittrick's graph by plotting the number of stations and the average temperature of all stations for each year. If you want to check my work, the program I wrote to do the calculations can be downloaded here. The graph above is reasonably similar to McKitrick's graph. The biggest difference is that the right-hand vertical scale in McKittrick's graph is clearly incorrect. The number peaked at 6,000, not 14,000 as his figure 3 indicates. (He actually has the correct number in his figure 2, which was copied from another paper.) Just taking the average of all the station temperatures is a rather poor way to estimate the global average temperature, since regions with a large number of stations will count for far too much in the global average. However, even this crude way of computing the average shows significant warming in the 90s. McKitrick's graph is also rather misleading since the GISS graph above is not calculated this way---the stations are weighted so that regions get the correct weighting.


To test McKitrick's claim that the warming in 90's might have been caused by the decline in the number of stations, all I had to do was just consider the stations that has measurements for every year from 1980 to 2000. The average temperature of those stations is shown as the green line in the graph above, while the average of all stations is in red. The blue line is the average temperature shown in the GISS graph. Note that all three lines show significant warming in the 90s. Whether you analyse the data in a crude way or a sophisticated way you still see warming. It is true that after correcting for the change in the number of stations, the warming is less, but it actually agrees better with the average temperature shown in the GISS graph. If you look at Hansen et al's paper that describes how the GISS graph was constructed, you will find that of course they noticed and accounted for the change in the number of stations:

"Sampling studies discussed below indicate that the decline in number of stations is unimportant in regions of dense coverage, although the estimated global temperature change can be affected by a few hundredths of a degree."

McKitrick does not acknowledge this or cite this paper.

The outcome of my analysis was just as I expected---if correcting for the change in the number of stations had removed the warming trend, Murray and McKitrick would already have told us about it.

In an email, McKitrick claimed that there were two problems with my test:

First, there was a change post-1990 in the quality of data in stations still operating, as well as the number of stations. Especially in the former Soviet countries after 1990, the rate of missing monthly records rose dramatically. So you need a subset of stations operating continuously and with reasonably continuous quality control.

However, the Soviet stations are only a small percentage of the total, so don't make much difference. And of course, if you look at Hansen et al you find that they have extensive checks on the data quality.

McKitrick continued:

Second, if in this subset you observe an upward trend comparable to the conventional global average, in order to prove that this validates the global average you have to argue that the subset is a randomly chosen, representative sample of the whole Earth. Of course if this were true the temperature people would only use the continuously-available subset for their data products. It isn't, which is why they don't. It would leave you with a sample biased towards US and European cities, so it is not representative of the world as a whole. The large loss in the number of stations operating (50% in a few years) was not random in a geophysical sense, it was triggered by economic events, in which stations were closed in part if they were relatively costly to operate or if the country experienced a sudden loss of public sector resources. One can conjecture what the effect of that discontinuity was, but to test the conjecture, at some point you have to guess at what the unavailable data would have said if they were available. Because of that, I cannot see how one can devise a formal test of the representativeness of the subsample.

Now this is just wrong. You don't need a random sample to estimate the temperature across the Earth's surface. Temperatures tend to be quite similar at places that are close to each other. You just need to space your stations over the Earth's surface and you have a representative sample. So you can actually estimate what the temperature would have been in the missing stations and you can actually test to see how representative the sample is and in fact Hansen et al wrote:

Sampling studies discussed below indicate that the decline in number of stations is unimportant in regions of dense coverage, although the estimated global temperature change can be affected by a few hundredths of a degree.

McKitrick, however, did not cite this paper.

McKitrick concludes:

None of this means that those researchers with access to the raw data can't propose and implement such tests as you propose (I wish they would).

Gee, McKitrick implies that researchers hadn't done such tests, when, as we have already seen, they had done such tests. When I challenged him on this, he contradicted himself:

I do not claim that adjustments are not being made, only that there is no formal test of their adequacy.

Presumably he talks of "formal" tests so he doesn't have to count the tests that have actually been done. (Our entire email exchange is here.)



More like this

Dr. McKittrick was evidently much in need of an audit; I'm sure that he could see this, and feels much the better now that it's been performed.

I'm surprised he took time out from their bagpipe lessons to compose such a long e-mail exchange.

BTW, last fall a couple of guys on google.sci.environment took Mann's datasets and reproduced Mann's 'hockey stick' graph.
Had M&M actually been collegial and consulted Mann, they likely would have had better M&M can only fade away.


If I have 1 Kg of water at 20 deg C and 30 Kg f water at 10 deg C, and I combined them, what is the final temperature of the combined masses of water?

Please submit your answers showing all relevant assumptions and reasoning.

By Louis Hissink (not verified) on 29 Apr 2004 #permalink



Now for your land temperature measurements, could you repeat the calculation please, since you obviously now realise that averaging temperatures requires that the temperature of air in one location, plus the temperature of air in another, requires you have associated each temperature with a discrete physical quantity, since, temperature by itself, being intensive, is not a quantity, and is therefore uncountable.

Your bat.

By Louis Hissink (not verified) on 29 Apr 2004 #permalink

Louis, Hansen et al describe how the average is calculated. Basically if temperature is defined by a function f(x,y) on a two dimensional domain, the average is

(∫∫f(x,y) dx dy) / (∫∫ 1 dx dy)


What is the physical object you have neasured the temperature of? Statistics deals with measured attributes of physical objects - discrete, "Objects".

And for that matter average what? Mean, geometric mean? There is nothing wrong with Hansen et al's definition, and once can compute the average of anything in a numerical sense but it must be mapped to "REALITY" not some imaginative construct.

So as Ross McKtrick has pointed out, what physical object are you associating temperature with?

If I take the temperature of a child, I have measured the temperature of a discrete object. If I measure the mass of a cannon ball, of specific manufactured size, I have measured the attribute of a discrete physical object.

If I measure the temperature of the air, I have noted the temperature of two physical substances in local thermal equilibirum - the air AND the measuring instrument - by definition sensu-strictu.

It matters not one whit how temperature is described mathematically, it matters a great deal what temperature IS and what physical object it is associated with.

If I have 1000 litres of water at 20 deg C, and 1 litre at 20 deg C, one could come up with the bizzare relation that as 20 deg C=1000, and 20 deg C= 1, the 1000=1. Absolute nonesense except in one case when temperature is a ranking, or degree of hotness, then and only then does this relationship make sense.

Look at it another way. If temperature were a quantity, and we map 20 deg C with 1000 litres, and 20 deg with 1 litre, then as proxies we could assume that the temperatures could be added to estimate the amount of water.

That is, 20 deg C is a proxy for 1000 litres, and also a proxy for 1 litre. So which mapping do we assume?

Temperature is not a quantity of anything. It is a measure of a degree of energy state - it is an intensive variable.

If we the think of temperature as a continuous field in a mathematical sense, then the idea of an average temperature is a nonsense.

By Louis Hissink (not verified) on 30 Apr 2004 #permalink


All those issues being considered, Iain Murray's point that the apparent increase in temperatures being associated with a rather drastic decrease in temperature stations strongly suggests that the removal of stations from the dataset seem to have affected the reporting of temperatures.

As the Satellite data show no signficant trend for the same period, one should conclude that the observed temperature rise is an artefact from change in the sample populatation which resulted in the removal of,presumably, remote stations at the expense of close stations. Hence the increase is nothing more than a bias towards UHI effects, and the temperature rise globally illusory.

One could also point out that on first principles each and every temperature station is physicall different, so that the measured equilibrium temperatures recorded are the temperatures of disparate physical objects - so it becomes even more crucial to assign the measured temperature with a physical object.

By Louis Hissink (not verified) on 30 Apr 2004 #permalink

Louis, air is a physical object. You can measure its temperature and find the average temperature of a large amount of it.
And you can take the average of a continuous field. We've known how to do this ever since integral calculus was invented. I even gave you the formula.

And satellite data now shows more warming than surface data.


You seem not to understand the concept of statistical sample support, so in terms of your comment above, you are averaging apples and pears.

The temperature at a particular location is the thermal equilibirum of the air, contained water vapour and solid earth in contact with the air at the time of measurement. These sample points are all physically different when one takes in consideration all the stations forming the data set. And I presume you are also aware that air temperature is also related to the amount of water vapour in that "indeterminate" quantity of air, so that factor also needs to be taken into account.

All that is done is to record the local equilibirum temperature that your sensing device notes at the time of reading. Temperature of what? You have not taken the temperature of a discrete physical object - well you have -the thermometer, and that is what is actually being averaged.

As the atmosphere is ONE physical object, of which you take a point measurement of it's temperature, and the record another temperature, presumably different, of the same object, then you have proven that the object of which you are recording the temperature is not in thermal equilibrium. The earth's atmosphere is never in thermal equilibrium.

As for the average of a continous field, true, but that is not what is being done in reality - temperatures are the data set for the calculation of an arithmetic mean, weighted by various ad hoc factors.

As for your latest satellite data, that data only applies to 30% of the earth's total surface; in fact it is even less than that since the authors note they cannot measure the temperature of snow covered areas. So it is an incomplete sample and under no circumstances representative of the earth per se.

The only statistically rigorous data are the MSU temperatures for the troposphere, and those data show neither warming nor cooling for the period 1978-2002.

By Louis Hissink (not verified) on 02 May 2004 #permalink

Louis, which MSU reconstruction are you using when you state "only statistically rigorous data are the MSU temperatures for the troposphere, and those data show neither warming nor cooling for the period 1978-2002"?

Louis, the thermometer measures the temperature of the atmosphere at the point where it is located. The formula for the average temperature requires the temperature at every point in the field, so these must be interpolated from the measured values at the weather stations. One simple interpolation scheme would be to construct a triangulated network on the station locations and use linear interpolation over each triangle. That means that each interpolated value is a weighted average of station temperatures. After doing the integration the resulting average is also a weighted average. The weights are not ad hoc. however, but depend on the location of the stations.

I don't see why the MSU temperatures should be the only one that count, but they show warming of 0.085 degrees per decade.

I don't see why the MSU temperatures should be the only one that count, but they show warming of 0.085 degrees per decade.

Likewise, I don't see why only the MSU temperatures should count, but there is considerably more doubt about how to intrepret the MSU numbers.

The satellites don't directly measure the atmosphere's temperature profile, rather it requires a lengthy analysis to determine the temperature. And there is no concensus within the scientific community on how to perform this analysis.

In addition to the results by Spencer and Christy which Tim cites in the post above, Remote Sensing Systems, C Prabhakara et al and K. Vinnikov et al have all produced their own reconstructions (all of which show a significantly larger warming trend than the Spencer and Christy dervived trend).

The MSU temperatures cover the whole earth, and are therefore statistically representative. The rest are not. It is as simple as that.

As for the "slight" wearming of 0.085 deg C per decade - that seems very much an artefact of fitting a linear regression to truncated data - by memory coefficients of determination of approximately 10% for the linear fit is pretty low.

By Louis Hissink (not verified) on 03 May 2004 #permalink

Yes, Louis the MSU data set is not that long, so the 0.085 might be wrong. But it's just as likely to be too low as too high. And as Ken Miles has pointed out, other researchers get higher warming trends from the MSU data. And warming trends over land are more important in any case, because that is where all the people live.


The regression fit which yields 0.085 deg C per decade, as above, has a coefficient of determination of ~10%. This is most likely an artifact of the data analysis than due to any real trend. Whether an increase of decrease, 0.085 Deg C is imperceptible.Hence the data show neither warming nor cooling at this stage.Rather one suspects that any interpreted temperature rises are from disparate data sets, incomplete modelling, or flawed theories.As Iain Murray showed with the graph from McKitrick, the correlation between a severe reduction in the number of meteorological stations with a sudden increase in station temperatures causes one to initially suspect a bias in the data towards urban heat island effects.A correct procedure would be to plot the stations on a map and see whether the remaining stations are clustered in the UHI areas. That would be pretty conclusive of a spatial bias. Whether that data is in the public domain is another. If not, then one must become extremely cynical, if not bordering on the Machiaveliian in approach.

By Louis Hissink (not verified) on 04 May 2004 #permalink

Tim,A comment about spatially representative sampling - the correct procedure is to take samples such that the "area" of influence is approximately equal for each sample. This means practically having sample sites at the centre of each, say 1 deg latitude by 1 deg longtitude, block. This is, of course extremely difficult.So one takes what one can and weights each measurement with an area of influence, or in the jargon, give each sample a commensurate support.In terms of earth temperature this is extremely complex and possibly unsolvable.Hence the tendency to use MSU satellites, and cross checking those data with balloons and other checks. So far the checks indicate that the MSU interpreted temperatures are probably the best approximation of the earth's temperature.Unfortunately there persists the belief that temperature is a measure of a quantity of energy, but fooling around with a laser light presentation pointer should disabuse one of this fallacy since those things have temperatures in the order of millions of deg C. Except that 2 AA batteries don't supply much in the way of energy.

By Louis Hissink (not verified) on 04 May 2004 #permalink

For Ken Miles,Ken, the usual sources, which we can all refer to.

By Louis Hissink (not verified) on 04 May 2004 #permalink

Louis, you have managed to demonstrate your ignorance of statistics, spatial sampling and physics.

Statistics: A low r squared doesn't mean there is no trend present. I calculated a 95% confidence interval for the slope of that MSU data set and got 0.058-0.115. The warming trend is statistically significant. You can't dismiss it as some artifact. And this is the dataset that shows the least amount of warming.

Spatial sampling: No, your sample points do not have to spaced equally. The problem of working out the region of influence of irregular sample points is not insoluble. In fact, it was solved almost one hundred years ago. Here's the reference: Thiessen, A.H. (1911) Precipitation averages for large areas. Monthly Weather Review, 39, 1082-1084. (And yes, I've read it.)

Physics: Temperature is a measure of a quantity of energy. The heat energy of an object is just the temperature times the mass times the specific heat.

Oh and another thing. How can you possibly wonder whether the data is public domain? Did you bother to read my post? I downloaded it and graphed it. And just for you, here's another graph. This just plots the averages for rural stations. No "urban heat islands" here. And still we see warming in the 90s.

Does anyone here actually know the protocols and calibration procedures involved in measuring surface temperatures (hint, they are not measured at the surface) or are you just guessing?

Bob Grumbine's point about folk who think that other folk who do something for a living are clueless about the obvious appears to apply. Now, to go where I should not, if you look at these time series with an ARIMA or related method, is the trend significant? (surface, Christy& Spencer, RSS, etc.)

By Eli Rabett (not verified) on 05 May 2004 #permalink

I've just discovered that my earlier comment on other reconstructions of tropospheric temperature changes was incomplete. A letter was recently published in Nature with another reconstruction.

I won't spoil anyone fun's by telling them what the authors found.

Louis, if you honesty believe that "The regression fit which yields 0.085 deg C per decade, as above, has a coefficient of determination of ~10%. This is most likely an artifact of the data analysis than due to any real trend. Whether an increase of decrease, 0.085 Deg C is imperceptible. Hence the data show neither warming nor cooling at this stage."

Then a better conclusion is that the method that you are using is insufficient to find a trend.

Of course, this goes against your earlier statement "The only statistically rigorous data are the MSU temperatures for the troposphere, and those data show neither warming nor cooling for the period 1978-2002."


that is precisely what I said - the trend is a data artefact. I have consistently stated that these trends are artefacts.

As for your incomplete reconstruction that has also been plied to.


Trends, to be significant, need to explain at least 50% of the data, unless you can show that the rest of the data are unrelated, and then if so, those data need to be removed, and what is left, then re-analysed. From experience using spatially correlated geochemical data, but then I have to deal with commerical reality,

Tim.Your reference to spatial analysis is quite interesting - obviously you don't know about the scientific advances we in the Geosciences have made over the last 100 yearsWe name it Geostatistics and I suggest you wander off to Prof. Colin Ward's office on the campus to get some background on this. We use Geostatistics in mining because it works - and from the number of mine failures the industry has had over time, it is commercial reality which makes Geostatistics correct and predictable.We are strict empiricists, and only use what works.Next thing I suppose you will try to convince all and sundry that there is such a thing as "negative" energy in order to substantiate the "FU-EFFECT".You are wandering into shark-infested waters I am afraid.

Tim,Temperature is a measure of energy STATE, not quantity - obviously you must then accept that 1000 litres of water @ 272 K = 10 litre of water @ 272 K.

Tim, Your post with the graph for 5/5/2004 - OK, supply the raw data, plus geographical positions of those data please; we might then work out what is really happening.

Louis, you continue to demonstrate your ignorance of statistics, spatial sampling and physics, topped off with an inability ot understand simple statements.

1. Statistics. There is no statistical rule that says an r squared of 0.5 is required for significance. I even worked out the confidence interval for you to prove that it was significant. Do you even understand what a confidence interval is?

2. Spatial sampling. Yes, things have advance in the last 100 years, but that just makes things worse for you. You claimed that a problem that was solved 100 years ago was unsolvable.

3. Physics. Temperature is a physical quantity that measures the average kinetic energy per particle. I gave you the formula for energy. It states, quite clearly, that you have to multiply by the mass, so in your example the 1000 litres of ice has 100 times as much energy.

In my original post I said that I downloaded the data and provided a link so that anyone else could download it. You responded by intimating that the data was somehow secret. I reminded you that my post said that I had downloaded it. Now you want me to supply it to you? Are you incapable of following a link?

Tim,1. Temperature is the measure of thermal energy
potential or level that a substance is AT; Heat
content is the measure of the quantity of thermal energy that
the substance has at that temperature. It is like topographic height,
electric or gravitational potential - you cannot get a quantity of
height, because it has no physical existence by itself. You cannot get
a quantity of temperature since physically it has no meaning eitherOn
your understanding the temperature of 1 million degrees in a small laser
pointer would therefore be capable of heating a litre of water
instantly.No, because you are missing something - the extremely vital
factor of the quantity of thermal energy associated with that
temperature. In the laser the quantity of thermal energy is miniscule
hence it can be used to light a spot on my hand but not boil a pot of
water.2.Spatial sampling - geostatistics is the theory of spatialised
variables, initially described by Matheron, followed by Blais and
Carlier, Agterberg and many others. It is the theory that the value of
a substance is dependent on its position is 3D space and that sampling
such variables need to take in account this spatial variation. The
reason this theory was developed was because it was discovered when
spatialised variables are not weighted by either area or mass and other
factors, usually defined by the variogram, then either severe over or
under estimations of the overall variable are made - and it is
intrinsically related to sample-volume variance as detailed by Hazen and
others in the geostatistical literature. Hence we have developed the
concept of sample support, because statistically we need to make sure we
are not adding apples and oranges. And that is precisely what happens
when thermometer temperatures are simply statistically analysed with no
reference to a mass of some physical object.<P>As per the example here.
The technically correct methodology is to weight the thermometer
temperatures by the mass of material the temperature is associated with
it, times of course the specific heat. This is not needed when MSU
temperatures are used.Different temperatures around the earth relate
to atmosphere in contact with different substances of different specific
heats so that if two disparate areas have the same temperature but quite
different specific heats, then that has to be factored into the
temperature calculation, because the stored Thermal energy is different
at both locations. This can be ignored if the earth were
compositionally isotropic.3 Statistics - So 10% of the data fitting a
trend is statistically significant despite 90% of the data not - really?
10 % of the data is representative of all the data is it? I would have
thought 90% of the data fitting a linear regression function would
significant - but no, we only need 10% of the data to fit the trend to
make it real and significant. What do the other 90% of the data then
fit? Or do we ignore that 90% of the data because they do not support
our theory. This is much like saying that if I have 100 men, 10 of whom
have heights of 1,75 metres, the rest anything from 2 to 3 metres, then
1.75 metres is the representative height of the 100 men is it?. It must
be by your rules. This is misuse of statistics. As the Number Watch
has shown, even random data can have a linear regression fitted - with
low coefficients of determination but such fits are the result of
truncated data and are therefore artefacts.4. Secret data? I wrote
in relation to the decrease in the number of weather stations over time,
then that is actually removing data from the dataset. In this case it
appears that the the stations which were closed would probably have been
the remote ones, since they cost the most to maintain, and by that there
would have been a resultant bias towards stations near urban areas, and
therefore a bias in the data towards the urban heat island effect. This
is what McKitrick and Murray pointed out in their
papers/articles.That over time from the closure of many weather
stations the data set has become not only smaller, but in all
probability also unrepresentative with a bias towards urban areas, hence
the sudden rise in temperature. And the remaining stations with a
continuous temperature record would be those located at airports and
other low cost localities, near urban areas. And that is what I wrote,
that is what I meant, and I implied nothing else.But somehow you seem
to think that this means I intimated you removed data from the data set?
I neither intimated nor wrote such thing.

1. Temperature is a physical quantity. See any basic text. For example here:

Physical Quantity: a physical quantity is a quantity that measures some property of the world (or universe) around us. Examples include the height of a person, the mass of a chemical sample, the temperature of a glass of water, or a fundamental physical constant like the gas constant (R).

2. You originally claimed that it was "possibly unsolvable" to properly weight temperature samples. It isn't unsolvable, and your examples are straw men because you haven't even looked at how the weighting is actually done.

3. You clearly have no idea what an r squared value of 10% means. It does not mean that only 10% of the sample has been used. Yes, you can fit a trend to random data, but that is why the concept of statistical significance was invented. The chance of getting a trend this big from random data is very small -- much much less than one in twenty.

4. Why do you continue to throw out speculations about the GHCN data? I've told you multiple times how you can download it and check it for yourself. I also graphed the average for rural stations above. That is RURAL stations. How is an increase in the temperature of RURAL stations supposed to have been caused by urban heat islands?

Rural Stations?where are your data Tim ?

Louis, I've told you over and over and over again ow to get the data. Follow the link in my original post. Why is this so hard for you?


I do not need to recover the rural data from the data set to prove you wrong.

YOU have to extract them to prove YOUR point, and thus allow us the opportunity to see whether you have fudged the data or not.

Louis, I HAVE extracted the rural data and plotted the results. I have provided a link to the data and a link to the program I wrote to average and extract the data. You just refuse to look at the data.

Sorry Tim, I am looking at the data right now. But I am looking at the RAW data.

Your ball, Tim,

Louis, I wrote a program in perl to average the data. The link is in my original post and always has been. Download it and run it.

You know, I picture Mann having the same frustration, handing off M&M to a grad student and saying to himself "F -it. I don't have time for these jokers."

Tim, you need to get a grad student for Louis.


How does the rural data look if it is taken back to 1930? The 30s and 40s were warmer than the 50s.

By Peter Bickle (not verified) on 02 Jun 2004 #permalink