Signal vs Noise

Over on my latest entry in the "How to Talk to a Climate Sceptic" guide, a commenter has taken issue with this passage:

Discerning a trend from noisy data is one of the most basic processes in scientifc research, so even though this argument has a naive appeal to the majority of us with no statistical training, you can be sure that any scientifically trained individual trying to make a case for cooling out of this graph is not being intellectually honest. Please consider any source of this argument as very unreliable, either by being very uninformed about basic scientific processes, or very dishonest, hoping to tke advantage of less informed people.

He writes:

I could apply the IDENTICAL argument you've made (covering the period 1850 to now) to the temperature graph of the last 1000 years (the one from the IPCC report). In this, the rise of 0.6 would simply be noise, in the same way the drop you point out can also be considered noise. You can't bang on about how important stats and science are but then make statements like "Climate is generally defined as the weather conditions averaged over a long period, usually around 30 years." These are not made by people with "statistical training" or "scientifically trained" individuals - where on god's warming earth does 30 come from, except as a number which aids your argument?

Rather than create a lengthy thread so early in the life of that article, I though it was worth discussing in a brief post of its own.

In a technical sense, he has two reasonable points: first, one person's signal is another person's noise and second, 30 is indeed a bit of an arbitrary number when it comes to seperating weather from climate.

WRT the first point, if, say, we wanted to analyze the effect of plate tectonics on global temperature, then yes ineed pretty much everything shown in the latest 1300 yr reconstruction he refers to should be considered noise. What is noise and what is signal depends entirely on what it is you are trying to learn from your data. (And yes, plate tectonics plays a very lare role in climate change! For example, the seperation of the antarctic peninsula from the southern tip of South America seems to be the indirect instigator of the formation of the West and East Antarctic icesheets. This seperation allowed the formation of the circumpolar current that inhibits heat flow from the tropics and keeps the region cold and insulated from the current global climate change.)

But on the timescales that humanity is concerned with, (those that define climate and the abruptness of climate changes), it is senseless to look at the .8oC recent rise as just one more meaningless blip in a noisy signal. So from a practical rather than technical perspective, no it is not reasonable to use my argument on his example.

His second point about 30 years is also interesting to consider. I would say up front though, that I did not just pick "30" out of thin air. The 30 year figure for averaging weather to define climate is a standard figure and was not choosen so I could make the point of my article, nor was it chosen by the IPCC to aid their scare-mongering drive for world domination. Because this is the accepted definition of climate, my point stands very firmly that a single point to point line of one year in length does not reveal any kind of climatic trend whatsoever. So niether of his objections to my article hold any water.

But at some point, picking 30 years was an arbitrary decision and maybe is not ideal, I don't know.

Any takers on that question? 30 years is about half a lifetime in [developed parts of] developed nations. Maybe 60 years would have been better? Before we started shaking things up, I think people everywhere counted on some degree of consistency in weather patterns (aka climate) over not just their lifetime, but their childrens and probably even grandchildrens. Maybe 30 years is just the best number to smooth out the weather noise.

It might be interesting to hear some background on this, I don't have the time to search for it and some things even The Google doesn't tell you.

Tags

More like this

Thanks for the response. Okay, firstly, I'm aware my 'reject the null hypothesis' challenge is too simplistic - but simplicity has to be the starting point. What effect are we trying to measure here? Proposition 1 is: the effect of the predicted increases in the % of CO2 in the atmosphere attributable to humans will raise the temperature of the earth significantly. Significant I would define as measurable (ie. separable from the noise) .

I can see what you mean when you say

'But on the timescales that humanity is concerned with, (those that define climate and the abruptness of climate changes), it is senseless to look at the .8oC recent rise as just one more meaningless blip in a noisy signal.'

but we first should try to ascertain the effects of CO2 on the temperature, should we not? The effect of any temperature change on human life is a secondary, though of course important, question - it only requires an answer if the first proposition is established. To be clear, I in fact think that the warming attributable to CO2 is real but fundamentally unmeasurable in that it is lost in the noise of a number of other more powerful temperature drivers. Until I am convinced proposition 1, I personally don't think it worthwhile debating the effects of it on human life. And human timescales and practicality are strictly irrelevant in determining whether Proposition 1 is true, and to take them into account would in fact be to mix up two independent issues.

I think when you say

'What is noise and what is signal depends entirely on what it is you are trying to learn from your data.'

that this is the issue. Rob Grumbin, in the link, has clearly given this a bit of thought, but he did not give enough weight to the fundamental fact that determining the 'useful' averaging period is fundamentally dependent on what you are trying to measure. For example, he says

'While we know that air is made up of molecules, it isn't a useful way to look at the atmosphere. There are far (far!) too many molecules for us to work on each of them individually. What we do instead is work on a collection of molecules -- a large enough collection.'

I don't want to cherry pick individual sentences (and not even from this site!) and go nuts on them - I'm sure Mr Grumbin would expand on what he said given the opportunity - but statements like this - similar to pulling 30 years almost from the air - are really the nub of the problem, and this is an interesting example. Whether looking at individual molecules is useful depends on and only on a) what you're aiming to measure and b) the effects you are trying to separate it from.

That is, it was certainly useful for Einstein when he first explained Brownian Motion, making assumptions about the density, speed and so momentum of whatever it was that was (ie. molecules) that was bumping around the pollen particles. But for a golfer deciding on which way the wind was before his shot to see how his ball would move, this level of detail would be pointless. You simply can't make blanket statements about what model is useful (and a moving average is a model, just a very simple one, of the actual data) unless you know the approximate magnitude of what changes you're trying to detect.

In other words, in a very strong sense, getting the correct model (moving average) IS the problem. It is not something you can decide on then just carry on - it is in a state of constant flux as you learn more about the effect you are trying to measure.

So to decide on the correct moving average, one must know a) the timescales that CO2 increases might be expected to work on and the approx size of the increase, and b) the timescales and the size of effects that are obscuring it ie. seasons, clouds, Milankovitch cycles, PDO, 22-year sunspot cycles, etc.

Further, there should have been a c) above - the effects which no-one knows anything about. That there are some of these with respect to the climate is surely certain.

What you do with such signals is quite clear and has been done a million times. You aim to isolate the effects of a particular input (eg. season variation) to the signal and then subtract it. Then do this as many times as you can and gradually [increase] the SNR.

Until people make observtions and suibsequent analysis that does this (and I'm not sure it is possible with climate, it could even be chaotic), I personally will consider AGW a non-problem (ie. reject the alternative hypothesis) and certainly won't be swayed by assertions that peaks and dips are just noise.

The thirty year thing may be very simply a statistics thing. If I recall the most elementary introductions to stats from high school, I seem to recall the teacher indicating more than once n=30 as being kind of a minimum number where means and st. deviations started having statistical significance (i.e., the uncertainties drop to an acceptable level).

Probably nothing really magical about the 30 years at all. The problem is when looks back through the climatic records, finding 30 years of stability is fairly difficult. Temperatures always seem to be rising or falling, even before AGW, droughts come along at semi-regular intervals etc.

By Bob North (not verified) on 10 Sep 2008 #permalink

John, you seem to have reached a lot of hard conclusions, and I'll hope elsewhere you've given the solid reasons for them.

Even with your nod about cherry picking, you managed to do exactly that. Going just one sentence further gives: While we know that air is made up of molecules, it isn't a useful way to look at the atmosphere. There are far (far!) too many molecules for us to work on each of them individually. What we do instead is work on a collection of molecules -- a large enough collection. How we define 'large enough' is to consider a variable of interest, say velocity.

The next sentence addressed your complaint. We have to think about what variable/process we're trying to study. No doubt there are interesting things that can be done in situations where one can study individual molecules. Feel free to cite papers that do so in studying climate, meteorology, oceanography, ...

If one reads farther in my note, you'll see that, in fact, 20-30 years does have properties of objective importance. On the other hand, 'moving average' appears nowhere in my note.

Fundamentally, though, you're conflating at least two problems. One problem is the fundamental 'what is climate', which I addressed. This can be addressed without concern for anthropogenic CO2. If you're a scientist, understanding the system is important and interesting. If we have observations of climate from before there were 'significant' additions of CO2, then we have the even better situation of being able to gain that understanding of the system from before it was perturbed by human activity. (Plenty of non-human perturbations to understand, so a good thing that we can study how they work without the added complication of human activity.)

The second is the question of what effect CO2 has on the climate system. If you believe that 'climate' (present your definition for it) has fundamentally changed because of anthropogenic CO2, present your reasons for that. The fundamental change here is that something has to have happened to make, for instance, 20-30 years either not long enough to average out the 'noise', or to be so long that the changes from CO2 are too great (averaging that long would then be misleading us by lumping a change we want to study in with the noise we want to get away from). If you have that argument, by all means present it and support it from data.

One also shouldn't be mislead by the use of 'signal' and 'noise'. As mentioned, one person's signal is another person's noise. Climate, whatever it is exactly, is not the short term average of meteorological variables. We care about the meteorology, certainly, when we're making weekend plans and trying to dodge hurricanes. But if we're looking to climate, such things are noise.

The 20-30 year sort of climate becomes uninteresting if what we're trying to do is study the 100,000 year ice age cycle. But since we'll all be dead long before the orbital variations have much to say about climate, the 20-30 year sort of climate still makes sense for our interests as citizens.

Bob:
The usual thing about 30 being a magic number in statistics is that once n is about that, things like Bernoulli problems smooth out enough that you can treat them as if they were normal curves. One can certainly have significance with samples smaller than that, but you have to do the harder work with the detailed non-normal discrete methods.

The matter of 20-30 years is not one of finding 'stability' in the instantaneous values. That's a fool's errand as weather does happen. It's that once you average over a period of about that length, the average (and, it turned out, variance; I haven't written that up, but it's in the program you can get from a link in my what is climate - 2) of global mean surface temperature for your sample doesn't depend much on exactly how long you average. This isn't true for a 7 year average, where going to 8 can frequently change your answer substantially (as the graphic shows).

Some asides:
I'm Robert on forms, Dr. Grumbine on some very rare occasions, and Bob most of the time. Never Rob.

Comments are still open and welcome on the what is climate 2 note. It would be correct to expect there to be a 'what is climate 3' to come. I think there's a fair amount to be gained by thinking about exactly what climate is.

I also have a spot for people to put up questions which don't have to be particular to any post I've already made.

Two points:

First, on the 30-year definition: the definition of EVERY word is fundamentally arbitrary. There's no Final Authority who can declare that a book is a publication with more than 30 pages and a pamphlet is a publication with less than 30 pages. A word means whatever people use it to mean. Some people draw the dividing line around 10 pages, some at 20, and some base it on additional factors. Who is to say that any one person's usage is the correct usage? This is entirely conventional. If you don't like 30 years as the dividing line between climate and weather, fine: use whatever number you want.

Moreover, it really doesn't matter. We're not trying to predict the weather exactly 30 years in the future; we're trying to figure out some basic averages for many different time frames.

The choice of temporal resolution for the window for a moving average depends on how much temporal resolution you want in your predictions. If you're only interested in temperatures 100 years from now, you can use an averaging window much larger than a decade. If you're interested in 10 years from now, you need a smaller window. And you can use any window size you want, but there *is* one wholly non-arbitrary way to make a decision: you want to minimize the absolute value of the second derivative of your resulting curve over long time spans. If you use a 10-year window and have large absolute values of the second derivative over time, then your averaging window is too small. As you widen the window, you get a smaller second derivative (good) and less resolution (bad). It's just a trade-off.

Second point: I think you're overlooking some important physics when you treat the greenhouse effect as just one among a huge laundry list of factors. In tackling a complex problem, the first task is to divide up the various factors into first-order effects, second-order effects, third-order effects, and so on. There are two factors to consider in assigning an order to a physical factor. The first and most important is the degree to which this factor operates independently of other factors. For example, sunlight is the first-order effect in calculating global temperatures. The second consideration is magnitude of the effect; teensy-weensy effects can be pushed way down the list. For example, the Poynting-Robertson effect certainly plays a role here, but it's so microscopic as to be something like 1000th order.

So we sort through all those effects and prioritize them. Greenhouse gases belong pretty far up there, because they are not much affected by other considerations. If the cosmic ray flux changes, or the earth's temperature changes, that won't affect the greenhouse effect itself. It falls in the "basic physics" category, so it's going to be up there next to albedo and rotation as one of the important fundamental factors. All those other factors that skeptics like to revel in (cloud formation, cosmic rays, changes in solar radiance, and so forth) are much lower down on the list because they are either reactions to higher-order effects or much lower in magnitude than the higher-order effects.

So we are not reduced to picking through a pile of physical effects and trying to figure out whether CO2 plays a role or not. We know from basic physics that CO2 definitely plays a role. That's beyond question. We can perform calculations to estimate the magnitude of those effects, and at the simple levels (or orders) of calculation, the CO2 effect is significant. Yes, we need to do the more complex calculations; but it's extremely rare in physics to find a factor that has a significant effect in low-order calculations but disappears in higher-order calculations. That would be truly weird. So we're on firm ground concluding that CO2 is having a significant effect.

By Chris Crawford (not verified) on 10 Sep 2008 #permalink

Thanks for the well thought out comments.

RG - Moving average is not a sinister attempt to put words in your mouth, it is just a term you might use in signal processing to describe the average of a time series in this way.

I did see what you said ie. How we define 'large enough' is to consider a variable of interest, say velocity. but that is not quite there (and though as this was not your main point at the time, and so Im clearly taking these remarks out of context). Velocity of gusts? Prevailing wind averages for maps? Velocity of individual molecules? The air movement is too complex to capture exactly and so some approximation via a model is unavoidable and Im just pointing out that choosing the wrong size for large and the wrong time period over which to average (ie. the wrong model) for this approximation is crucially dependent on what you aim to measure. So I am conflating the two issues you mention but not accidentally - Im saying that they cant be separated. That is, climate is too complex to capture totally and so some model to approximate it is always required, and this depends on what you aim to measure. You cant answer the what is climate question, then move onto the what is human CO2 doing to the climate like you can with simple tests in a lab, it is too complicated. Model and results are part of a feedback process based on the statistics and our knowledge of the climate changes my (fairly minor) complaint would be that you spent too much time on the stats and not enough weight was given to the underlying physical processes and so..to get back to where we were, choosing 30 years and getting on with it is not going to cut it this figure should be constantly questioned. Maybe Im making too much of this and you agree with this broadly anyway, in which case that was all a waste of time

Chris In tackling a complex problem, the first task is to divide up the various factors into first-order effects, second-order effects, third-order effects, and so on.. Couldnt agree more the reason I cant abide these computer models is that they predict piles of feedback (ie. second order effects) without anyone acknowledging that noone really understands the first order effects. Not sure I would agree though with all the things you say are obviously second order effects (although Im a mathematician, not a climatologist). Reflection of light is a fairly basic principle of physics also which, I understand, is part of the deal with high cloud cover. And more clouds means more water vapour which means more greenhouse gases which again seems fairly basic to me (although I agree the connection of cloud cover with cosmic rays is yet to be demonstrated). And what about the CO2 consumed by vegetation, which everyone ignores all the time - cant imagine a clearer first order effect myself. And sunspot count correlations with the weather seem awfully first order to me.

I agree the warming effect of CO2 is first order, but its magnitude I think is simply much smaller than other first order effects. Im not ignoring basic physics, I already was clear that I think warming due to CO2 increases is real I just dont see how it can be important. Calculations Ive seen show predicted temperature increases roughly consistent with has happened (Im ignoring the absurd models with the absurd feedback), even if I agree that the whole 0.6 degrees is due to CO2. And it is also a fairly basic principle (is it not?) that the temperature response to the CO2 concentrations is logarithmic (ie. as it rises, you need more for a unit increase in temperature than before). Say it goes up again by another 30%, then another.. we still can only be talking about an absolute maximum 1.5 degrees in the next 100 years. How anyone can think that is a problem that requires changing our lives and immediate spending of piles of money right now is simply beyond me.

And also, minimising the abs of the second derivative is non-arbitrary yes but also does not take into account the underlying physical processes and so is still not good enough for me.

John, perhaps we can cut through some of the confusion by short-circuiting this notion:

You can't answer the "what is climate" question, then move onto the "what is human CO2 doing to the climate" like you can with simple tests in a lab, it is too complicated.

We don't really need to worry about the climate. Let's throw the whole idea away. Pretend that the word "climate" is erased from the English language. Instead, ask this question:

What will increasing CO2 concentrations do to the long-term averaged globally averaged surface temperature?

and for that, we have some very solid answers.

I don't think you have a firm grip on the concept of first-order effects, second-order effects, and so on. It's funny, I had never in my education been given a definition of the concept, but it was used so frequently that I just absorbed it experientially. Again, there are two factors that are used to set the priority level of different factors in a calculation:

1. The absolute magnitude of the effect (we ignore tiny factors initially)
2. The degree to which the effect is dependent on higher-order effects.

This second factor means, for example, that the sequestration of CO2 by vegetation is a lower-order effect than the concentration of CO2 itself. Obviously, you can't calculate the effect of vegetation until AFTER you've calculated the concentration of CO2.

There is one horribly messy exception to all this: the rare case when the lower-order factor has an effect greater than the variable upon which it is dependent. This is the source of runaway phenomena, which are truly rare in nature. Most phenomena generate reactions that are smaller in magnitude than the initiating factor; this is why we have stable equilibrium situations almost all the time. But there are a few such cases involving CO2 concentrations. One arises from albedo changes caused by melting snow and ice. If the temperature warms just a little, then some of the ice melts, exposing the darker surface underneath. That darker surface absorbs more sunlight, so the temperature rises a bit more, and more ice melts -- the situation can run away on you. The ice caps have been in rough equilibrium for centuries, but now that we're increasing the CO2, causing the temperature to rise slightly, we're seeing this runaway effect on the northern ice cap. Data on the southern ice cap is still uncertain.

On the role of minimizing the absolute value of the second derivative, that is solely a method of separating signal from noise. It is not meant to address the underlying physical mechanisms; it's just a way of determining that there is in fact a signal there. Later on, when we get much better at all this, we can start fitting the data directly to the output of our models, but for now the point of this particular exercise is to demonstrate that the rise in temperature we are experiencing is real and anomalous.

And what about the CO2 consumed by vegetation, which everyone ignores all the time - can't imagine a clearer first order effect myself.

"Everyone" in this case does not include the scientists working on the problem. Please see IPCC AR4, Chapter 7, page 526, "Terrestrial carbon cycle processes and feedbacks to climate".

And sunspot count correlations with the weather seem awfully first order to me.
Please see IPCC AR4, Chapter 2, page 188, "Solar variability".

By Chris Crawford (not verified) on 11 Sep 2008 #permalink

John, a quick point about matching the observed temperature increase to the observed CO2 rise. The global surface temperature has a lot of inertia behind it, the full effect of the current rise is not expected to be realised until several decades from now. This is because of the large amounts of heat the oceans can aborb without warming much.

Lindzen continually makes this basic mistake, I suspect intentionally, when he calculates his very low sensitivity figures. See this article.

Ok, I like the arguments being made here. I'm clearly being a bit slapdash at times (yes, I was just throwing around "first-order" without any thought really) and am being (appropriately) put in my place. I'm far from converted but will absorb your comments and will no doubt have other questions (if anyone still cares).

John, coby's "How to Talk to a Climate Skeptic" is an excellent starting place, but if you really want to dig into the gore, I urge you to have a look at the IPCC AR4 reports, which you can find at http://ipcc-wg1.ucar.edu/wg1/wg1-report.html

By Chris Crawford (not verified) on 11 Sep 2008 #permalink

re 30 years (again)

A little research can go a long way - 30 years is not included in WMO's defintion of climate, which is:

"Synthesis of weather conditions in a given area, characterized by long-term statistics of the variables of state of the atmosphere in that area" [WMO, Guide to Climatological Practices, 2nd edition, Glossary (found here - wmo climate guide )

30 years appears to come from the definition of "climatological climate normals" or "normals" The glossary also mentions "periods" as any period of 10 years of longer starting on jan 1 of a year ending in one (i.e., decadal averages) which suggest that we are not limited to using only 30 year averages. Therefore, as note by a poster above, it seems that what period to use is somewhat dependant on what your period of interest is, the entire holocene, the pleistocene and holocene or the last 50 years

By Bob North (not verified) on 12 Sep 2008 #permalink

I did find this on the WMO FAQ page:

Climate in a narrow sense is usually defined as the "average weather," or more rigorously, as the statistical description in terms of the mean and variability of relevant quantities over a period of time ranging from months to thousands or millions of years. The classical period is 30 years, as defined by the World Meteorological Organization (WMO). These quantities are most often surface variables such as temperature, precipitation, and wind. Climate in a wider sense is the state, including a statistical description, of the climate system.

http://www.wmo.ch/pages/prog/wcp/ccl/faqs.html

This passage also reinforces that this is all relative to what you are interested in, as discuss in the interesting comments above.