More Stupid Graphs

Remember the post I made a couple of weeks ago, flaming the wall-street idiots for
a bad graph?
They were comparing the value of financial firms before and after the current
mess. But they way that they drew it was using circles, where the diameter of the
circle was proportional to the values, but the way it was drawn strongly suggested that
the area was the metric of comparison.

Well, an astute reader sent me another example of the same error - but it's even
worse. This one is misleading in two ways. Take a look and see if you can figure out
what the two errors are. I'll explain beneath the fold.

i-7e01249fb54e68e99aa09575ffd6c513-unem-graph.png

The first problem is exactly the same as the one the Wall Street idiots made: the
diameter of the circles is proportional to the actual data - but the use of circles as a visual presentation strongly suggests to any reader that the area is the real measure.

The other error is a bit more subtle. The measure represented in the charge is
unemployment rate - a percentage figure. But the presentation as a simple area
strongly suggests to the viewer that the area is proportional to the number of unemployed people. So, for example, looking at the chart, you'd think that Florida has many more unemployed people than Texas - from the bubbles, it looks like Florida has about twice as many unemployed as Texas.

Based on a bit of searching, the number of unemployed people in Texas is has an unemployment rate around 6.6 percent, and a total number of unemployed somewhere around 781,000 according to the Texas Workforce Commission. According to Florida International University, Florida has an unemployment rate of about 8.6 percent, for a total number
of unemployed just under 800,000. In other words, Florida and Texas have roughly the same
number of unemployed people.

So looking at the graph, you'd guess that the number of unemployed people in Florida is far larger than the number in Texas - which is not true. And even if you know that those bubbles represent percentages, you'd guess the the rate in Florida is roughly double
Texas, when in fact, it's around 25% higher.

The basic error there is presentational - but still important. The way that you present
data has a significant impact on how that data is going to be interpreted by people who see it. Certain presentations imply certain interpretations.

A percentage figure is a fraction, and so it should be represented by something that
illustrates the fact that it's a fraction. Presenting percentages via area-bubbles the way
that this chart does is pretty much guaranteed to produce misunderstandings. Area
bubbles imply that the volume represents a quantity, not a fraction. So most people looking at
that will thing that, for example, the graph is saying that there are more unemployed people
in Alaska than there are in Texas - when in fact the opposite is true. The unemployment
rate is higher in Alaska, but the number of unemployed people in Alaska is
much smaller than the number in Texas. In fact, according to the US census bureau,
the entire population of Alaska is smaller than the number of unemployed people in
Texas!

There's even a third problem with it, but it's less serious than the other two. The
diameters of the circles represent the state unemployment rates. At the bottom of the chart,
there's a time-based bar-chart of the national unemployment rate. The height of
those bars is directly comparable to the diameter of the circles. But because of the difference in presentation, you would not expect to be able to compare them. In a proper
chart, like measures should be presented in like manner. If you're going to present a measure of a circular area, then everywhere in a particular chart that you present that measure,
you should use a circular area. I don't think that anyone looking at that chart
would have the slightest clue that they're supposed to compare the diameter of the circles on the map to the height of the bars on the timeline.

If you wanted to do a graph like this, the appropriate presentation would be something
like pie charts: something that clearly illustrates the fact that the relevant measure is a fraction. In fact, if you took that graph, and made the area of the circles proportional to the population of the states, and then used each circle as a pie graph illustrating the unemployment rate, you'd have an extremely informative diagram. But as presented,
it's worse than worthless - it's actively misleading in multiple ways. And I don't even think that it's misleading because someone was trying to present the data in a skewed way; it's misleading simply because the person or people who made it was too stupid to do it right.

More like this

I wasn't going to write about this, because I really don't have much to add. But people keep mailing it to me, so in order to shut you all up, I'll chip in. As everyone knows by now, we're in the midst of a really horrible financial disaster. I've argued in the past on this blog that the root…
Time to get back to some topology, with the new computer. Short post this morning, but at least it's something. (I had a few posts queued up, just needing diagrams, but they got burned with the old computer. I had my work stuff backed up, but I don't let my personal stuff get into the company…
If there's a topic that I don't write about much, it's obstetrics. The reason is that it's not a major area of my interest, and it's not an area where I have as much expertise as I do in, say, cancer or even vaccines. My expertise in cancer comes from my career, of course, and my expertise in…
And I suspect he's done so willingly. Well, you know what they say about statistics and liars. Here's the story. The Wall Street Journal and the Daily Mail independently published highly misleading and blatantly idiotic pieces on climate change. We've covered this extensively already over the…

Is there a way to blow that up? As it is, I can barely read the text.

Actually, I think a better way to draw the map version would be to simply shade the states themselves, either grayscaled (lighter colors == less unemployment) to darker colors, or red-orange-yellow-green shadings (though that hits the r-g colorblind and generally is not used often anymore).

By Joe Shelby (not verified) on 30 Mar 2009 #permalink

The lower chart also has a y-axis that doesn't start at 0, which exaggerates the percentage change.

By Steve Downey (not verified) on 30 Mar 2009 #permalink

I've got to say I think you're wrong on this one. I won't say the whole circles on a map thing is the bestest graph ever, but I don't think your specific criticisms hold water:

1. area vs. diameter:

I think this was more clearly wrong on the Wall Street thread, where the circles were nested and the volume was definitely misleading in at least some cases. Here, I think diameter might actually be closer to right.

What matters is how they're perceived. For example, area-wise, I'd guess that Michigan's dot is not quite twice the area of Oregon's. But I, and I think a lot of people, would actually say that a dot like Florida's or Mississippi's is closer to half the "size" of Michigan's.

Probably there is some complicated perceptual metric one could use to get this exactly right. It certainly might be better just to use bars or something. But if you're using circles, I think diameters are at least as good as area in this case.

2. Rate vs. number:

The purpose of the chart is to compare the scale of the unemployment problem across states. And unemployment rate is a better metric of that than total unemployment. (Imagine that Wyoming has 50% unemployment and California has 5%. Which state has a bigger problem? Which state has a bigger dot using total unemployment?)

I also don't really think it's correct to think of a number like unemployment percentage as a fraction. We're not usually concerned with any "whole" here, just scaling the quantity by population so we can make comparisons. (Calling it a fraction might be technically correct in some sense, but it's missing the point. The fact that unemployment is a fraction of the total workforce is important to the definition, but not important at all for making comparisons or rule of thumb judgments about the scale of an unemployment problem - which are almost entirely what it's used for.)

3. Bottom bars vs. map:

I'm not even sure what you're talking about here. The bars at the bottom are a time series, and don't seem to have any geometrical relationship with the circles AT ALL. I think the 2009 bar is about twice as high as the diameter of even the largest circle on the map - but that doesn't make any sense if it's supposed to be a (weighted) average of all the circle diameters.

It might be nice to design a time series that could be compared to the map - but this one clearly can't be. (It also wouldn't be all that useful, since there are no time series for the states, and that's probably more information than is useful anyway.) (P.S.: The national/state comparison - for for the current year only - is actually that little blue US map down in the Gulf.)

Again, I'll certainly agree that this is a somewhat poorly conceived chart. Just not, as far as I can see, a wrong one.

By jack lecou (not verified) on 30 Mar 2009 #permalink

I'm with Jack Lecou on this one.

The point of the graphic was to give the reader an idea of what parts of the country were hardest hit by unemployment, not to deliver the raw numbers to the end user. While the author may have goofed by distorting the sizes due to scaling the diameter instead of the area, I think he was right to focus on unemployment rate, and display it relatively rather than as part of a whole.

Joe Shelby's idea of color coding the states is a good one as it would have side stepped the issue altogether.

Looks like the entire population of Michigan is unemployed..

The point of the graphic was to give the reader an idea of what parts of the country were hardest hit by unemployment,

Well, Jack Lecou and Joel, in that case it looks like Florida has twice the problem, but I'm pretty sure that 8.6% isn't twice 6.6%. But maybe my math skills are rustier than yours. So the graph is terribly misleading, no matter how you slice it.

There's yet another way in which the graphics are misleading.

It has to do with the geographical size of the areas represented by the circles. The East coast states are smaller in size than, say, Texas. Texas has an unemployment rate of 6.6, you say. If Texas were instead replaced by ten smaller states each having an unemployment rate of 6.6 (to keep it simple), Texas' single blue disc would be replaced by ten discs of the same size as the single disc. At first glance the area would appear to be "much more unemployed". The graphical presentation of unemployment on the East coast suffers from this.

By Kristian Z (not verified) on 30 Mar 2009 #permalink

That graph sucks in so many interesting ways. First of all, the circles overlap, so you can't even really see most of the ones on the East Coast. People have a lot of trouble comparing areas, and the "Play the Timeline ->" in the upper right corner suggests it's animated over time, which is terrible to use too.

Pie charts might not be that helpful either though---people have as much trouble comparing angles as they do areas.

I would have much rather the data just be in a table. States down the rows, and columns for populations and unemployment rates in different years. It would be much easier to compare states or times (which requires a lot of back and forth).

Well, Jack Lecou and Joel, in that case it looks like Florida has twice the problem, but I'm pretty sure that 8.6% isn't twice 6.6%. But maybe my math skills are rustier than yours. So the graph is terribly misleading, no matter how you slice it.

Well, and I'll admit this is a stretch and may or may not have been the idea, but a certain low level of unemployment is not a problem (2-3%, say), so it'd be at least halfway valid to use that as a baseline. Or it could even be that the bubbles don't directly correlate to the underlying unemployment figures at all, just their rank. That would be somewhat odd, but not necessarily misleading, in my opinion.

In any case, you only get in trouble when you try to map the size of the bubbles directly back into unemployment numbers. But I think you're over thinking it - the intent of the chart is to convey an entirely number free impression. It's not clear to me so far that the impression conveyed is actually wrong, or even misleading. Again, what the graph is saying is that the problem in Florida is maybe about twice as bad as Texas, not necessarily that any standard statistic is. (Which is arguably true - 8.6% is certainly a lot worse than 6.6%.)

What'd be really nice is to have a link to the original chart, which evidently has popup details for each state, with the actual numbers used. It could be there's actually a legend somewhere in the interactive version too.

Again, ugly chart, but I think the biggest problem is the scale of the bubbles, obscuring everything on the East Coast, not any actual innumeracy.

By jack lecou (not verified) on 31 Mar 2009 #permalink

This representation makes it clear that the problem in the North East is huge. The unemployed are overflowing into the nearby states. We need to call out the National Guard, round them up, and save the economy.

Maine has over 100% unemployed. Michigan has maybe 200% unemployed. These circles must be drawn by those innumerate sport coaches, who tell their players to go out and give 110%, probably 111% in the case of Spinal Tap!!!!111!!!!111!!!

Some of these people are so hard working that they are unemployed at 2, 3, or even 4 jobs at the same time. Where do they find the energy?

Nothing points out the seriousness of the problem like this fish tale chart. The unemployment rate is so bad, it's this big.

jack lecou,

And unemployment rate is a better metric of that than total unemployment. (Imagine that Wyoming has 50% unemployment and California has 5%. Which state has a bigger problem? Which state has a bigger dot using total unemployment?)

If we were to apply that to this chart, California would be a bit smaller, but Wyoming would probably be off the page. This would make it clear that Wyoming can bring about the end of the world as we know it. We really do need ways to present information that is informative. This is as informative as a neon sign for the blind.

This chart is just hyperbole. What would this chart look like if the Great Depression unemployment rates were represented according to this scale? A bunch of indistinguishable blobs that extend far beyond the edge of the page. This is such a bad chart, it should be used for this purpose, and this purpose only - to teach people how not to make charts.

Rhode Island, Connecticut, and Delaware combined must have a bazillion percent unemployment.

The map of the US appears to have more than 2/3 of the area covered with the unemployment bubble of doom. Do we have 66.6+% unemployment? Is this the work of the Anti-Christ?

The map of the US has 100% of the height covered. Since the height of these bubbles seems to be what they are using for measurement, this must mean 100% unemployment.

Everybody involved in the publication of this tragedy should have the opportunity to experience unemployment first hand. Even then these idiots may not understand the unemployment situation. This chart is ridiculous, so I feel that just a touch of ridicule is in order. :-)

(Had a longer post, but looks like it got eaten. Apologies if this ends up as a duplicate.)

Well, Jack Lecou and Joel, in that case it looks like Florida has twice the problem, but I'm pretty sure that 8.6% isn't twice 6.6%. But maybe my math skills are rustier than yours. So the graph is terribly misleading, no matter how you slice it.

Well, not to defend this too much, but there are a variety of reasons this might be. For example, they might be using a non-zero baseline (which is more or less valid, as a few percent unemployment is "normal"). Now, that might be questionable, but we really can't say without the original interactive chart - which evidently has pop up details on each state (with the actual numbers), and maybe a better key of some kind.

What you have are questions about the mapping between circle size and unemployment rate - but that's only "misleading" if you suppose that mapping is supposed to be direct or easily reversible.

Basically, I think you're over thinking it. You run into trouble when you try to map the bubbles directly back into numbers, but that's just not how the chart is intended. It's meant to give a completely number free impression, and I haven't see anyone yet explain why that impression is wrong or even misleading.

Again, the chart gives the impression that Florida's problem is twice as big, which is arguably close enough. (When you're talking unemployment, 8.6 is a lot worse than 6.6.)

By jack lecou (not verified) on 31 Mar 2009 #permalink

Rogue Medic-

It's not clear to me how much of your comment is tongue in cheek ridicule, but you do seem to be assuming for some reason that the size of the circles ought to be related to the geographic area of the states.

I don't see how that's implied, or how it would make it a better graph. Area has nothing to do with either population or unemployment - it would just serve to make small states invisible, even if they had severe unemployment.

However, I would agree that the bubbles should be uniformly scaled down, small enough that the bubbles mostly fit inside their respective states, and to make the East Coast visible (labeled lines to offshore bubbles might be useful for e.g., DC, DE, etc.)

Also, I don't understand what "Wyoming off the page" even means.

My point about Wyoming and California was that California has a population of 36 million. Wyoming has 500,000. Say the workforce is 50% of both. Then if Wyoming has 50% unemployment and CA 5%, graphed on a chart where bubble area equals workers, Wyoming's 125k unemployed would disappear next to California's 900k. Yet Wyoming's 50% unemployment would be an epic catastrophe next to which California's 5% is barely a concern.

All of which is to say that using the raw number of unemployed workers to make comparisons between states is, 99% of the time, completely pointless and far more misleading than a slightly opaque bubble-size formula.

By jack lecou (not verified) on 31 Mar 2009 #permalink

Incidentally, I think what I was reacting to mostly initially was the suggestion that:

...looking at the graph, you'd guess that the number of unemployed people in Florida is far larger than the number in Texas

I certainly wouldn't have guessed any such thing. Nor, I think, would anyone who's even passingly familiar with the ways we usually measure and compare unemployment. In this chart, it would have been meaningless.

Same with asking that unemployment percentage be treated as a fraction. 99% of the time, it's not. It's usually treated quite appropriately like a quantity with units of "unemployment points". Or, maybe a sort of dimensionless quantity (people/people), like an f-number.

At any rate, to say that you should treat it as a fraction when graphing it is ludicrous. Presumably that means representing the numbers using pie charts or stacked bars or something, but with a state-to-state comparison I'm not sure how that would even be supposed to work, let alone how it would make anything clearer.

By jack lecou (not verified) on 31 Mar 2009 #permalink

Again, the chart gives the impression that Florida's problem is twice as big, which is arguably close enough. (When you're talking unemployment, 8.6 is a lot worse than 6.6.)

Wow, really, that is nonsense with a capital "non". "A lot worse" I can agree with, but you're saying that 8.6% is twice 6.6% and that's stupid, frankly.

There's yet another way in which the graphics are misleading. It has to do with the geographical size of the areas represented by the circles. The East coast states are smaller in size than, say, Texas. Texas has an unemployment rate of 6.6, you say. If Texas were instead replaced by ten smaller states each having an unemployment rate of 6.6 (to keep it simple), Texas' single blue disc would then be replaced by ten discs of the same size as the single disc. This would make the area appear to be "more unemployed".

By Kristian Z (not verified) on 31 Mar 2009 #permalink

When I go through the ridiculous excercise of measuring pixels, I get about 21px and 32px for the diameters. That's a 50% increase, whereas 6.6 vs. 8.6 is more like 30%.

It's twice if you assume a baseline of 3%, which isn't the way I'd do it, but isn't entirely crazy.

Without measuring all the little circles on the chart and looking up the numbers, I'm guessing what they did was use a 2 or 3 percent baseline. That's slightly sketchy, especially since they don't really say that (as near as we can tell, but all we're going on is a screenshot of an interactive chart...), but all it really does is highlight the differences, which is perfectly fine for an impression.

It's only a problem if you assume that the way you're supposed to read the actual numbers off is to take a ruler and count pixels...

By jack lecou (not verified) on 31 Mar 2009 #permalink

It's twice if you assume a baseline of 3%, which isn't the way I'd do it, but isn't entirely crazy.

Sorry, I mean that both 3.6%/5.6% and 21px/32px are about the same ratio.

By jack lecou (not verified) on 31 Mar 2009 #permalink

Jack lecou,

Actually, your argument seems more than a little problematic. I do agree that employment rate is not as bad a measure as Mark suggests, but it is not nearly as good a choice as you seem to be saying. In your conclusion you say that using raw numbers is pointless and misleading 99% of the time, but for an example you have one state with a 5% unemployment rate and the other with a 50% rate. This is supposed to be a typical comparison? Perhaps in that bizarre case, far different from what is happening now, or for that matter has ever happened, raw numbers might be grossly misleading. But consider the case where the unemployment rates are 8% in California and 12% in Wyoming. The use of diameters and percentages would indicate that the two are about the same or at least in the same ballpark. Hardly a huge difference right. However, given your numbers it means the number of unemployed in California is 50 times larger than the number in Wyoming. I would say that that is a much larger problem and a problem more closely related to what is happening now, and what normally happens, than the example you chose.

jack lecou,

If Wyoming, or any other state, were to have a 50% unemployment rate the bubble would probably extend beyond the edge of the page. You have measured the pixels, so you may know the number of pixels required for 50% and the number of pixels to wander off the page with a circle.

A lot is tongue in cheek. This graph is like a bad haiku. You are correct that area and population do not correlate. Delaware and Wyoming are good examples for this. With this graph, I look at it and wonder if the person responsible was possessed by the spirit of Jackson Pollock.

Perhaps as an interactive page, there is more information. Perhaps it is even useful. More likely, someone got carried away with a feature and overused the interactivity. The interactivity probably does not make up for the unemployment droppings all over the map.

For an example of good charting of economic data, my favorite economic data site is Econompic. The data is presented in a way that is informative easy to look at. I have no ties to Econompic.

Wow...some of the defenses of this chart are almost as bad as the chart itself. Whenever I see ANY presentation of data, the first thing I always ask is - "compared to what?"

1) Using a non-zero baseline on ANY chart is a misleading way of presenting data.

2) There is NO way to defend the use of area to display data if you're using the underlying number as the diameter. A chart is a 2-D medium, as is area. Therefore, area should represent the number - sam reason

3) Trying to compare rates of one state to another, say California to Wyoming IS a bad idea...especially when you're trying to show a National Trend. This is sort of related to one of the worst maps of all time - the plot of the electoral college map which seemed to imply that electoral college votes are allocated by square mile rather than population. If you're going to compare the RATES of Wyoming to California, you have to keep in mind that California is about 50 times the population of Wyoming, so even though Wyoming might have a higher rate, the sheer number of actual unemployed in California would be staggering. In other words, if Wyoming DID have a 50% unemployment rate, it would be bad for Wyoming, but the OVERALL effect on the NATIONAL rate would be miniscule, given that Wyoming is the smallest state population wise.

1) What is the point of doing this by State? Especially since we can't determine which bubble goes with which state! It might have been better to group rates by region.

By jjsocrates (not verified) on 01 Apr 2009 #permalink

If Wyoming, or any other state, were to have a 50% unemployment rate the bubble would probably extend beyond the edge of the page. You have measured the pixels, so you may know the number of pixels required for 50% and the number of pixels to wander off the page with a circle.

There's obviously no constant number of pixels required. If Wyoming had the highest unemployment, one would scale all the bubbles uniformly so that Wyoming's fit roughly inside Wyoming - and all the others would be even smaller.

In fact, I've seen some of these bubble charts with a slider next to them so that the viewer can scale them up and down as they please. (For all we know, this one had one too.)

By jack lecou (not verified) on 01 Apr 2009 #permalink

1) Using a non-zero baseline on ANY chart is a misleading way of presenting data.

That seems like a rather unlikely rule. Not to be snarky, but do you always graph temperatures in Kelvins?

In this case, I do think whatever they're doing is amplifying the differences a bit. But it's only a problem if you're expecting a linear relationship. Come on. They're frigging bubbles.

What I get from the chart is: worst in MI; bad in the SE, Pacific coast, and RI; kinda bad in the south and Appalachia; not awful in the plains, the mountain west, and (if you can make it out through the poorly scaled bubbles), most of the NE.

I agree some population perspective would be nice, or maybe economic production. But that would basically take cartograms, and short of that there isn't much else to get from a chart of state unemployment numbers.

(Stacked bars might work, sort of, but it would obscure the unemployment rate, which is important in this case.)

2) There is NO way to defend the use of area to display data if you're using the underlying number as the diameter. A chart is a 2-D medium, as is area. Therefore, area should represent the number - sam reason

I think if you're using circles, the area of which is not entirely intuitive at a glance, you should use whatever best matches the human perceptual system. That might be area in this case, but it might be something else. I wouldn't jump to the conclusion that it's area.

Again, bad chart: yeah. Just not egregiously bad, incorrect chart.

Trying to compare rates of one state to another, say California to Wyoming IS a bad idea...especially when you're trying to show a National Trend.

Huh? Who's trying to show a national trend? I think they're obviously trying to show regional trends.

It's also not trying to show how much each state is contributing to the national problem - which is why population/workforce isn't so relevant. Just how bad it is in each state for whoever lives there.

(I agree a cartogram or something would be ideal, though.)

1) What is the point of doing this by State? Especially since we can't determine which bubble goes with which state! It might have been better to group rates by region.

Most of the problem there is just the bad scale- which might not even be the designer's fault.

The problem with dividing things up into regions is that you're telling, not showing, and giving less information (whatever regions you choose aren't going to allow for outliers like RI, for example). Ideally the regional differences are just obvious without being pointed out explicitly. (Which the bubbles sort of succeed at, though they could obviously be better.)

By jack lecou (not verified) on 01 Apr 2009 #permalink

Area is clearly not equivalent to diameter, so the fact that one has to wonder about which represents the value makes it a poor graph. And I agree with Mark that my default when shown images of different area is to take the area as proportional to the value being represented. One reason circles are conventionally used to represent areas is that we are all used to pies, and we know that a pie twice the diameter has quite a bit more than twice as much filling. Since we are thinking about people, the natural way to think of the circles is as a group of people standing together, viewed from above--in which case the number of people would be proportional to area. Except, of course that it's not even number, it's a fraction. If you are showing a fraction, the most easily understood way to do it would be as a slice of pie, with the area of the slice corresponding to the proportion of the full pie with the same diameter. Second best would be as little vertical bars, in which case both the area and height are proportional to the value.

Of course, we all know why it was done this way, because the eye is sensitive to the area, and it looks like there are much bigger regional differences than there really are, which makes the graph more visually interesting.

I can see an argument for showing percent unemployment rather than total unemployment, because it is percent unemployment that is probably more relevant to how much an individuals risk of unemployment is in a particular region, and also the economic impact (because tax revenues also rise with population)

But consider the case where the unemployment rates are 8% in California and 12% in Wyoming. The use of diameters and percentages would indicate that the two are about the same or at least in the same ballpark. Hardly a huge difference right. However, given your numbers it means the number of unemployed in California is 50 times larger than the number in Wyoming. I would say that that is a much larger problem

Except in a very real sense it is not a larger problem. There's a very good reason we divide the number of unemployed people by the size of the workforce.

In your situation, California may have 50 times as many unemployed people, but it also has 72 times as many people. And something on the same order of magnitude times as many potential employers, customers, friends and relatives to help you out, etc., etc.

Basically, it may have 50 times as many unemployed people, but it also has something more like 72 times as much capacity to absorb them. In that scenario, unemployment is just not as big a problem in California as Wyoming.

Now, you can argue that basically all the unemployed people in Wyoming could simply move to California without making a big difference to California. Which is technically true, and in that sense Wyoming's probably is smaller - it contributes less to the national average. It's also smaller in a sort of sum-of-human-suffering sense, where clearly the misfortune of a couple million is worse than the misfortune of a few tens of thousands.

But geography matters. The fact is that all of the unemployed people in a small state are not just going to up and move to another state. And it is interesting to see how some states are more or less affected by the downturn than others, regardless of whether those states are hugely important in the scheme of things.

By jack lecou (not verified) on 01 Apr 2009 #permalink

If you are showing a fraction, the most easily understood way to do it would be as a slice of pie, with the area of the slice corresponding to the proportion of the full pie with the same diameter. Second best would be as little vertical bars, in which case both the area and height are proportional to the value.

I can certainly imagine a map that had stacked bars for the number of unemployed relative to the size of the workforce. Or pie charts of the same thing, with the area of the pie scaled relative to the size of the workforce. That would be interesting in some ways - it would be easy to compare the number of unemployed in different states, or the size of the workforces.

The problem is that it would not be so easy to compare the rates, as that would involve doing a little mental/graphical calculation on the two little bars for each state. It really obscures the interesting statistic, which is the percentage of unemployed (see above).

It's just really not correct to think of unemployment percentage as a "fraction" that always needs to be represented in a pie chart. It's an index that can stand by itself perfectly well.

By jack lecou (not verified) on 01 Apr 2009 #permalink

jack lecou,

There's obviously no constant number of pixels required. If Wyoming had the highest unemployment, one would scale all the bubbles uniformly so that Wyoming's fit roughly inside Wyoming - and all the others would be even smaller.

Is there any explanation of the scaling method used?

If Michigan seems to have the highest unemployment rate and is the basis for the size of the other bubbles, why expect that the bubble for Wyoming would fit inside of Wyoming, roughly? And roughly may be the word for the chart, but I like egregiously. :-)

The bubble around Michigan looks as if it should have some sort of protective effect. Some sort of shield. Or maybe, Michigan is going to travel in a bubble like Glinda the Good

1) Using a non-zero baseline on ANY chart is a misleading way of presenting data.

That seems like a rather unlikely rule. Not to be snarky, but do you always graph temperatures in Kelvins?

A chart that gives the impression of a more dramatic change, by not using a zero baseline, is misleading.

Kelvins are measured from Absolute Zero. Celsius has zero degrees. Zero Celsius is not zero Kelvin, but it should be clear, to those using the Celsius temperature scale, what zero degrees Celsius is. The same is true of Fahrenheit. When you use the Kelvin scale, how many people really understand what zero degrees Kelvin means? Far fewer than those who understand the more conventional scales.

The chart should make it clear what the baseline is. If it were using a baseline of a long term average of unemployment, instead of zero, that could be acceptable, but only if it is properly labeled. For some things, using zero as the baseline will cause the changes in the data to be indistinguishable. That is an appropriate time to use a non-zero baseline. This is why few charts of change in temperature use zero degrees Kelvin.

What is being used as baseline on this chart? Could it be the square root of pi? That is difficult to say. Then there is the significance of such a profound number for a baseline. And why not use a negative baseline? A wavy baseline? A baseline that varies based on a complex formula? An optically illusory baseline? Or we could try something really bizarre, like zero.

Imagine charting the changes in outdoor temperature on a Kelvin scale. Even worse, indoor temperature changes. We don't need to heat the office in the Winter. There is no noticeable difference in temperature on the chart - at least when you use a Kelvin scale. This chart is a pean to the arbitrary and obscure.

Trying to compare rates of one state to another, say California to Wyoming IS a bad idea...especially when you're trying to show a National Trend.

Huh? Who's trying to show a national trend? I think they're obviously trying to show regional trends.

That is one of the bad things about the chart. What is the chart trying to represent?

Ideally the regional differences are just obvious without being pointed out explicitly. (Which the bubbles sort of succeed at, though they could obviously be better.)

Could be better? Is that as far as you are willing to go?

Sort of succeed at? The Maginot Line sort of succeeded. The tanks had to drive around it. They did not come through it. So, the Maginot Line was an obvious success from the perspective of the troops stationed there. The rest of France was the problem.

The market in sub-prime mortgages was a success, up until 2008. There are many examples of success, that are only examples of success if you don't look at them too closely, or test them under real world conditions.

The biggest problem is just the desire to use a circle to represent a line. Why not use octagons. We are familiar with octagons in everyday use. They are STOP signs. If we wish to stop unemployment, what could be better?

Why not parallelograms? Mobius strips? Images of bacterial colonies being treated with antibiotics?

There is no good reason to encourage this use of bad simile/metaphor/analogy/synecdoche/ or any other substitution that only confuses the data represented. What justification is there for the people responsible for this chart to have jobs, when so many other people do not? This is a bad chart.

Is there any explanation of the scaling method used?

The scaling method for making all the bubbles reasonable sizes is multiplication by a scalar constant. Hardly mysterious.

That doesn't affect the sizes of the bubbles relative to each other, so it is obviously separate from the complaints about the baseline or diameter/area.

A chart that gives the impression of a more dramatic change, by not using a zero baseline, is misleading.

So, if I graph a time series of, say, weight or body temperature, and use a sensible non-zero baseline in order to zoom in on the region of interest, I would be making the changes look more dramatic and misleading? Seriously?

This is scaling in the same way. Without it, it might be difficult to see that MI is really the worst, and not, say, SC. Or that WI is less than MN.

What you actually seem to be objecting to is the lack of a clear scale. Which would be a fair complaint, except you don't really know there isn't one, because we don't have the full--interactive--chart, just a blurry static snapshot.

That is one of the bad things about the chart. What is the chart trying to represent?

Regional trends in unemployment. Duh. Like this one. (Which also, OMG, uses a non-zero baseline.)

Could be better? Is that as far as you are willing to go?

Yes. I'm not defending the chart per se. What got my goat in the first place was that Mark used a lot more bad math to attack it than it contained in the first place.

In other words, there are certainly a couple of things to find fault with on that chart, it's just that for the most part they don't seem to be the same ones people here are actually faulting it for.

There's no bad metaphor, simile or anything else. As near as I can tell you're just cantankerously complaining about an unfamiliar (to you) interactive chart idiom.

By jack lecou (not verified) on 02 Apr 2009 #permalink

Why is Jack Lecou so committed to defending such a bad chart? Who created this chart to begin with? Certainly someone without a copy of a Tufte book in their shelf.

Clearly limited understanding of what a non-zero baseline is, given the example he linked to.

A non-zero baseline, is when you are comparing numbers such as 2.0% and 2.5% and you arbitarily set the range of the y-axis to say, 1.0%. Now, instead of showing 2.0% vs. 2.5% you are showing 1.0% vs. 1.5% and the total area displayed appears to be an increase of 50% instead of 25%. It is a misleading way of displaying data.

The plot linked to is far superior to the the one Mark showed, and doesn't suffer from the problem of the non-zero range because it is not showing the relative difference of one state to another...it is merely highlighting (or coloring) which ones fall within a certain range.

By jjsocrates (not verified) on 06 Apr 2009 #permalink

It's just really not correct to think of unemployment percentage as a "fraction" that always needs to be represented in a pie chart.

It escapes me how an index that represents one number as a fraction of another can reasonably be regarded as anything other than a "fraction."

It escapes me how an index that represents one number as a fraction of another can reasonably be regarded as anything other than a "fraction."

Let me see if I can help you out.

The Brix scale is a ratio - 25 Brix means there are 25 grams of sugar and 75 grams of water in 100 grams of solution. Does that mean we have to use pie charts if we wanted to make an illustration of, say, the sweetness of different fruits? How about the heat of chili peppers with the Scoville scale?

Other small concentrations of chemicals are often measured in parts per million - a fraction! If we wanted to make some kind of chart similar to the above, would we have to use pie charts with tiny, nigh-invisible wedges? And things like sodium levels in blood are often measured in milliequivalents per liter. Another fraction! Obviously we would always have to graph these as fractions of a person's total blood volume.

If we wanted to make a chart illustrating cholesterol levels in different countries (usually mg/dL or mmol/L) would we have to have pie charts that showed tiny little wedges in circles scaled for the total volume of blood in Romania?

And hey, heart rate. That's beats per minute. I smell pie chart.

Tax rates, growth rates, interest rates? More pie charts!

Or how about proof (e.g., of alcohol), slope (rise over run!), Hertz (same as heart rate, really), velocity (meters per second!), acceleration (m/s/s - which two numbers do we even graph?!), jerk (m/s/s/s!)?

Every quantity that comes from the ratio of two numbers always has to be represented as a fraction? Really?

By jack lecou (not verified) on 07 Apr 2009 #permalink

Why is Jack Lecou so committed to defending such a bad chart? Who created this chart to begin with? Certainly someone without a copy of a Tufte book in their shelf.

I'm not defending the chart. Feel free to attack the bad parts.

But dumb attacks are dumb attacks, and fuzzy thinking is fuzzy thinking, bad chart or no.

A non-zero baseline, is when you are comparing numbers such as 2.0% and 2.5% and you arbitarily set the range of the y-axis to say, 1.0%. Now, instead of showing 2.0% vs. 2.5% you are showing 1.0% vs. 1.5% and the total area displayed appears to be an increase of 50% instead of 25%. It is a misleading way of displaying data.

I know what changing the axis of the chart does. But whether it's misleading or not depends totally on the context - the labeling, the intent of the chart, etc.

Presumably you wouldn't object to graphing body temperatures on an axis from, say, 95-105 deg F, right? And yet that's exactly the same kind of supposedly misleading distortion. There's simply no bright line rule.

You have to actually make a case about WHY it's misleading in this specific context. Not just claim that any transgression of this supposed "baseline rule" is automatically misleading.

The plot linked to is far superior to the the one Mark showed, and doesn't suffer from the problem of the non-zero range because it is not showing the relative difference of one state to another...it is merely highlighting (or coloring) which ones fall within a certain range.

So, wait, this chart would be ok if instead of smoothly scaled circles, there were a half dozen discrete sizes?

Non-zero baselines are ok if and only if you use cubby holes?

By jack lecou (not verified) on 07 Apr 2009 #permalink

(I got a little carried away and sloppy with some of the ratios above - you obviously can't make a pie chart from beats per minute, for example, because the units are different. I probably should have confined myself to ratios that come out unitless.

I hope the basic point is clear though - lots of things are actually fractions or ratios, we don't always have to think of them as such.)

By jack lecou (not verified) on 07 Apr 2009 #permalink

jack lecou,

Sorry, I thought I had responded to this, but I guess I did not hit post.

Is there any explanation of the scaling method used?

The scaling method for making all the bubbles reasonable sizes is multiplication by a scalar constant. Hardly mysterious.

That doesn't affect the sizes of the bubbles relative to each other, so it is obviously separate from the complaints about the baseline or diameter/area.

Hardly mysterious? Hardly labeled. If this chart were so obvious, there would not be such a long debate with only one person defending this failure.

A chart that gives the impression of a more dramatic change, by not using a zero baseline, is misleading.

So, if I graph a time series of, say, weight or body temperature, and use a sensible non-zero baseline in order to zoom in on the region of interest, I would be making the changes look more dramatic and misleading? Seriously?

This is scaling in the same way. Without it, it might be difficult to see that MI is really the worst, and not, say, SC. Or that WI is less than MN.

What you actually seem to be objecting to is the lack of a clear scale. Which would be a fair complaint, except you don't really know there isn't one, because we don't have the full--interactive--chart, just a blurry static snapshot.

Now I remember why I did not hit post. I was looking for some video on optical illusions, because circles are an excellent way of misrepresenting data. Unfortunately, I could not find what I was looking for. I believe it was from Daniel Ariely's book, Predictably Irrational. I could not find anything in his available on line videos.

And you give an excellent example of how to take a quote out of context and use it to create a straw man. If you read what I wrote after the sentence you quoted, it should be obvious that I was not criticizing non-zero baselines, but the inappropriate and misleading use of non-zero baselines. similar top taking a quote out of context to represent the opposite of what was intended.

You claim that these bubbles make the information clear. Perhaps they are useful to someone already familiar with the data, but then what would be the point? This chart is not clear.

That is one of the bad things about the chart. What is the chart trying to represent?

Regional trends in unemployment. Duh. Like this one. (Which also, OMG, uses a non-zero baseline.)

Great Googly Moogly! A chart that actually labels information, so that people might understand what is going on - at least people not involved in creating the chart.

There's no bad metaphor, simile or anything else. As near as I can tell you're just cantankerously complaining about an unfamiliar (to you) interactive chart idiom.

I am cantankerous sometimes. I feel completely justified throwing poo at this larger pile of poo.

The purpose of a chart is to present data in a way that is more easily understood than if the data were presented in a table. This chart requires a table to explain what is going on.

Hardly mysterious? Hardly labeled. If this chart were so obvious, there would not be such a long debate with only one person defending this failure.

You're confusing two different things:

1. The scaling of the final shapes to fit onto the map

2. The mapping from the underlying data into the relative sizes for the different shapes, i.e., choice of baseline, choice of diameter vs. area, etc.

#1 is what I was referring to as not mysterious. It's just the difference between world coordinates and display coordinates. Ordinary scaling. This is how you get all the circles to fit reasonably on the map, without one circle being huge.

E.g., if your largest circle is MI, and it has 10x the diameter of the smallest, you scale both by a factor so that the circle for Michigan fits inside the state boundaries, and then the smallest circle will be 10x smaller. (Obviously if you use area instead, you have to scale to keep relative area constant, but the principle is the same - you can adjust the largest circle to fit the map and the others follow, without changing the chart meaning.)

This is why your statement about Wyoming being off the map doesn't make sense. If Wyoming is large, you just scale it and everything else until it fits.

#2 is where the controversy is.

And you give an excellent example of how to take a quote out of context and use it to create a straw man. If you read what I wrote after the sentence you quoted, it should be obvious that I was not criticizing non-zero baselines, but the inappropriate and misleading use of non-zero baselines. similar top taking a quote out of context to represent the opposite of what was intended.

Reading back, it seems I may have skimmed over too hastily the paragraphs below the discussion of Kelvin:

The chart should make it clear what the baseline is. If it were using a baseline of a long term average of unemployment, instead of zero, that could be acceptable, but only if it is properly labeled.

It wasn't my intention to create a strawman.

It seems we are agreed that baselines don't need to be zero - it depends on the chart.

And I still don't see a convincing argument about how this particular chart is misleading. It looks like the baseline was chosen to fit the data and to exaggerate the differences. Is this misleading? It does break the metaphor of the circle a bit (zero size doesn't mean zero unemployment), which is bad, and you can't tell at a glance that unemployment is X times in this state than that state.

But then, you have exactly the same problems with charts with discrete size steps, and/or color based charts, neither of which seem to meet with so much disapproval.

And there are tradeoffs. It's possible that a chart with a zero baseline would not have shown differences or rank between states as well. (Again, look at MN and WI.)

The circles themselves are an odd choice, but I realized the other day that that's probably because of the animation feature - I expect our visual system probably can't track changes in color over time as well as it can track shapes. (The animation idea may or may not be misguided, but hey, none of us have even seen it.)

The solidest complaint seems to be about the apparent lack of a clear scale, but again, that might be in some kind of interactive popup we don't know about. At the very least, there seems to be popups for each state to display the unemployment number. Awkward and a little obfuscatory, but enough to clear up any confusion quickly enough.

Again, if the chart is creating a false impression, what exactly is it?

You claim that these bubbles make the information clear. Perhaps they are useful to someone already familiar with the data, but then what would be the point? This chart is not clear.

Shrug. Charts are an abstract visual language, and some background understanding is always necessary to understand one. I don't think any familiarity with the data is necessary in this case, though.

Coming at the chart from my point of view, having seen the charts with circles before, and some familiarity with the sort of figures used to look at unemployment, the meaning of the chart and the data was immediately clear to me.

I was frankly baffled when I read Mark's interpretation - he seems especially less than familiar with the use and abuse of unemployment numbers. And you and he both seem to have a very conservative proscriptivist attitude toward the use for circles in terms chart language.

But it's really not confusing when you know the lingo.

By jack lecou (not verified) on 08 Apr 2009 #permalink

You're confusing two different things:

1. The scaling of the final shapes to fit onto the map

2. The mapping from the underlying data into the relative sizes for the different shapes, i.e., choice of baseline, choice of diameter vs. area, etc.

#1 is what I was referring to as not mysterious. It's just the difference between world coordinates and display coordinates. Ordinary scaling. This is how you get all the circles to fit reasonably on the map, without one circle being huge.

We clearly have different definitions of reasonable. True, you do not have a problem of just one circle being huge. The whole right side of the map is overlapping huge circles. Well, some are huge, some may not be, but who can really say? The person who designed this can say. We could ask him, but when the chart creator is not available, we have to assume many things.

E.g., if your largest circle is MI, and it has 10x the diameter of the smallest, you scale both by a factor so that the circle for Michigan fits inside the state boundaries, and then the smallest circle will be 10x smaller. (Obviously if you use area instead, you have to scale to keep relative area constant, but the principle is the same - you can adjust the largest circle to fit the map and the others follow, without changing the chart meaning.)

I suppose that might make some sense if the circle for Michigan clearly fits inside the state boundaries. Maybe you do not see it, but it seems to me that the whole of Michigan (even the Upper Peninsula) would fit inside that bubble. The alignment is a bit off, but then the whole chart seems a bit off to me.

This is why your statement about Wyoming being off the map doesn't make sense. If Wyoming is large, you just scale it and everything else until it fits.

I guess it depends on how much you presume prior to looking at the chart. Not having created the chart, I approach it looking for information, rather just applying what I already know about the chart or the style of the person creating the chart. A good chart is made to provide information, not to tell me what I already know or confirm what I believe.

Presenting the data, so that the largest state unemployment rate fits the size of the state, might make some sense. What if it were Rhode Island that had the largest unemployment rate. Then how would this scaling method work? You have to jump to a lot of conclusions to get this chart to make sense. That is not good charting.

It wasn't my intention to create a strawman.

Sorry, I have been arguing with anti-vaccine people for a few days and I suppose I am starting to project their behavior onto others. I should avoid them for a while, otherwise I might start projecting this onto family. I apologize.

It looks like the baseline was chosen to fit the data and to exaggerate the differences. Is this misleading?

What is misleading is that it does not make these things clear. I can look at a table of this information and get a better idea of the unemployment situation in the different parts of the country. A chart should provide quick clear information. While some of the information might be available through the interactive features, relying in them does not convince me that this chart is informative. The baseline is not clear. How much does it take to make it convenient for the viewer to see what the baseline is?

But then, you have exactly the same problems with charts with discrete size steps, and/or color based charts, neither of which seem to meet with so much disapproval.

I gave an example of a site that provides graphs that I like. What I expect from a graph is neatly organized concise information. Some charts may have further information available when examining the chart in more depth.

Looking at this chart my first thought was, What's with all the bubbles? They were more of a distraction than anything else. This is not information that is easy to present, but the chart you presented of unemployment data seemed much more clear and concise. It made it clear what was the highest and what was the lowest and what the range was. That the baseline was not zero, could easily be seen.

And there are tradeoffs. It's possible that a chart with a zero baseline would not have shown differences or rank between states as well. (Again, look at MN and WI.)

I'm not really concerned that it is a zero baseline, but that it is an ambiguous baseline. A picture can say a thousand words, or even raise a thousand questions, but this stops me from getting to the main part of the message.

I start looking at the things that seem to be confusing. I look at things differently from most people. I spend a lot of time trying to explain the way I see things. Sometime I get it right. This chart is not something I would use to present information.

Would this work if a tiny state had the highest unemployment rate? What about California?

At the very least, there seems to be popups for each state to display the unemployment number. Awkward and a little obfuscatory, but enough to clear up any confusion quickly enough.

I do not think it is a good idea to rely on the interactive features. If I am using a laptop with a mousepad, the interactive features, where I have many information points close together, will drive me crazy. I am horrible with a mousepad and try to use a mouse whenever possible, but that is not always practical. This filling the screen with interactive features is the kind of thing that discourages me from using some sites.

Again, if the chart is creating a false impression, what exactly is it?

The information is not clear. The other chart you showed made the right side of the chart easy to see. I could quickly see the relative rates for all of the right side of the map.

Shrug. Charts are an abstract visual language, and some background understanding is always necessary to understand one. I don't think any familiarity with the data is necessary in this case, though.

Familiarity with the data is what I expect to be provided by the chart. Having information about what the circles are, or what the size is based on, would be nice. From what I see, if the unemployment rate were to double everywhere, no bubble would change. The bubbles would not indicate any difference at all, but the problem would be much worse. As with many other changes, doubling of the unemployment rate would not mean that things are twice as bad, but many more times worse. The bubbles would not do a thing to present this information. I do not feel that I am being provided with the kind of information that puts things into a helpful perspective. This is not something that I can return to and notice how things have changed.

Coming at the chart from my point of view, having seen the charts with circles before, and some familiarity with the sort of figures used to look at unemployment, the meaning of the chart and the data was immediately clear to me.

On the right side of the map, the circles remind me of the cold war maps of the effect of the detonations of nuclear weapons.

I was frankly baffled when I read Mark's interpretation - he seems especially less than familiar with the use and abuse of unemployment numbers. And you and he both seem to have a very conservative proscriptivist attitude toward the use for circles in terms chart language.

I suspect that there are several types of people in looking at visual data. Circles do not work for me, with few exceptions. Pie charts fall into the same category, again with few exceptions.

But it's really not confusing when you know the lingo.

But an essential part of the lingo, any means of communication, is to provide a key. The key does not have to be complex, I would prefer it to be simple, but it needs to be there to help the typical member of the audience to understand the material. Some charts are designed to raise questions, but this one seemed to raise the kind of question it should be answering rather than raising.

I like charts. They tend to get to the point quickly. I do not think that this chart got to the right point.

We clearly have different definitions of reasonable. True, you do not have a problem of just one circle being huge. The whole right side of the map is overlapping huge circles. Well, some are huge, some may not be, but who can really say? The person who designed this can say. We could ask him, but when the chart creator is not available, we have to assume many things.

Why is this mysterious? It has nothing to do with the data.

The NYT had a chart like this with county level results after the election. There was a slider you could play with to scale the circles up or down, either make them really tiny so none overlapped, or make them huge for a sort of bubbly marshmallow-man effect.

I expect there was a similar slider here, either in the chart creation software, or, very possibly, on the viewer's side. There's an error here, but we don't even know if the fault was the chart creator's, or the guy who took the screenshot.

I suppose that might make some sense if the circle for Michigan clearly fits inside the state boundaries. Maybe you do not see it, but it seems to me that the whole of Michigan (even the Upper Peninsula) would fit inside that bubble.

Ditto.

Presenting the data, so that the largest state unemployment rate fits the size of the state, might make some sense. What if it were Rhode Island that had the largest unemployment rate. Then how would this scaling method work? You have to jump to a lot of conclusions to get this chart to make sense. That is not good charting.

You don't necessarily just scale it so the largest bubble fits entirely inside the state. Remember the slider? You slide that back and forth until you get a result that looks right, where the bubbles don't overlap too much, are at least mostly inside the state boundaries, etc. (And I would suggest that for some of the smaller eastern states, their bubbles should be located offshore, with arrows to the actual state.)

But again, we don't even know whether the viewer or designer has control of that slider.

And NONE OF THIS SCALING HAS ANYTHING TO DO WITH THE DATA. The circles all retain exactly the same proportions. No "assumptions" are necessary. It's like decisions about whether map boundaries should be thick or thin lines. Aesthetics. (Which can obviously affect the chart's clarity sometimes, like when the bubbles overlap, but has nothing to do with "assumptions".)

I have been arguing with anti-vaccine people for a few days

Yikes. My hat's off to you.

What is misleading is that it does not make these things clear. I can look at a table of this information and get a better idea of the unemployment situation in the different parts of the country.

Well, maybe you can. But I doubt I could, not without some staring. The whole idea is the geographic relationships, which are not present in a table. Talk about needing lots of background information...

Looking at this chart my first thought was, What's with all the bubbles? They were more of a distraction than anything else. This is not information that is easy to present, but the chart you presented of unemployment data seemed much more clear and concise. It made it clear what was the highest and what was the lowest and what the range was. That the baseline was not zero, could easily be seen.

I'd probably use a colored chart or something myself.

What I think we have to keep in mind is that this chart evidently has some kind of time-series animation of these circles. I expect that wouldn't work quite as well with colors. (And maybe it doesn't work well at all, but we haven't seen it.)

I do not think it is a good idea to rely on the interactive features....This filling the screen with interactive features is the kind of thing that discourages me from using some sites.

Again, that's reasonable. But my basic point is that we just have no context to criticize some of this stuff. We've got one little screenshot from some kind of Flash application or something. No link to the page where it was found. Nothing. Who knows what kind of exculpatory context is either in the interactive chart, or on the page. (Maybe, even probably, none, but I don't feel comfortable declaring it "bad math" until I see it.)

The information is not clear. The other chart you showed made the right side of the chart easy to see. I could quickly see the relative rates for all of the right side of the map.

Again, that seems to be a simple aesthetic scaling issue. Might even be a slider to adjust that if we had the real chart...

Familiarity with the data is what I expect to be provided by the chart.

Right. I meant no prior familiarity is necessary, i.e., it's not necessary to know what the regional pattern is.

However, some familiarity with the dynamics of unemployment, and an expectation that there might be regional variation, is probably a reasonable and necessary part of the background.

Having information about what the circles are, or what the size is based on, would be nice. From what I see, if the unemployment rate were to double everywhere, no bubble would change. The bubbles would not indicate any difference at all, but the problem would be much worse....The bubbles would not do a thing to present this information. I do not feel that I am being provided with the kind of information that puts things into a helpful perspective. This is not something that I can return to and notice how things have changed.

Similarly, on a color coded map, if unemployment doubled everywhere, the color scale would not change.

You're basically just trying to misuse this chart. It's showing regional differences in unemployment. Not something else.

On the right side of the map, the circles remind me of the cold war maps of the effect of the detonations of nuclear weapons.

The whole thing reminds me of nuclear detonations. That's probably the visual metaphor I'm using, and why it doesn't suggest bacteria colonies or whatever to me.

But an essential part of the lingo, any means of communication, is to provide a key. The key does not have to be complex, I would prefer it to be simple, but it needs to be there to help the typical member of the audience to understand the material. Some charts are designed to raise questions, but this one seemed to raise the kind of question it should be answering rather than raising.

I don't think that's true. That's what I mean by lingo. Scatter plots or pie charts or color-coded maps would ALL take quite a bit of puzzling out if you were seeing them for the first time and were not already familiar with them, probably even educated on them in school.

Charts don't explain every aspect of their visual metaphors in a key, anymore than I provide a definition every time I use a word. It's part of the language.

By jack lecou (not verified) on 09 Apr 2009 #permalink

I suspect that there are several types of people in looking at visual data. Circles do not work for me, with few exceptions. Pie charts fall into the same category, again with few exceptions.

I suspect there are, and I can understand that.

What I DON'T think is that the difficulty some might have with certain types of charts necessarily makes those types of charts "bad". Certainly I wouldn't say pie charts are always bad just because there are some out there who find them difficult to read.

If you find a particular chart hard to read, find another one that presents the data differently. But it'd be a little solipsistic to declare the first chart "bad math" (on that basis alone, anyway).

By jack lecou (not verified) on 09 Apr 2009 #permalink
But an essential part of the lingo, any means of communication, is to provide a key. The key does not have to be complex,

I don't think that's true. That's what I mean by lingo. Scatter plots or pie charts or color-coded maps would ALL take quite a bit of puzzling out if you were seeing them for the first time and were not already familiar with them, probably even educated on them in school.

Charts don't explain every aspect of their visual metaphors in a key, anymore than I provide a definition every time I use a word. It's part of the language.

A map provides a key. what do dots on the page represent. If the map has different sized dots to represent different populations, that is indicated in the key. What political boundary markers are used. What the scale of distance is. If different colors are used, what the colors represent.

On the right side of the map, the circles remind me of the cold war maps of the effect of the detonations of nuclear weapons.

The whole thing reminds me of nuclear detonations. That's probably the visual metaphor I'm using, and why it doesn't suggest bacteria colonies or whatever to me.

In both the nuclear detonations and the bacterial colonies, the data in the circles would represent area, not diameter. I think that this is the part that I still find most misleading. The area is what we notice most, but the diameter is the extent of the information being represented.

If I spill something that forms a circular mess, my problem is the whole circle. The whole mess is the whole circle. In this chart, my problem is just the diameter.

If you find a particular chart hard to read, find another one that presents the data differently. But it'd be a little solipsistic to declare the first chart "bad math" (on that basis alone, anyway).

I guess we are just going to end up disagreeing on this. I do not like the chart. I think it is more likely to mislead, than inform. I will have to ask some other people to take a look at it to get some variety of opinions. I may try for a consensus of solipsists, if I can find enough. :-)

I think that you can look at information presentation and judge it as good or bad. Effective or ineffective.

A map provides a key. what do dots on the page represent. If the map has different sized dots to represent different populations, that is indicated in the key. What political boundary markers are used. What the scale of distance is. If different colors are used, what the colors represent.

Sure, a key, but again, we don't have enough of this chart to render a verdict on that. (And the chart does explain that the dots represent unemployment percentage - so the complaints, Mark's in particular, that one would expect them to represent actual populations is ludicrous.)

But note that a key doesn't provide a full explanation of the chart. For example, a scatterplot is expected to have clearly labeled axes, but it's not expected to have an explanation of the concept and assumptions behind Cartesian coordinates.

Aside from a theoretically valid complaint about the missing key (if it is really missing) a lot of the criticism here seems to fall into the latter category.

In both the nuclear detonations and the bacterial colonies, the data in the circles would represent area, not diameter. I think that this is the part that I still find most misleading. The area is what we notice most, but the diameter is the extent of the information being represented.

Possibly.

But again, even if we are sensitive to area, that basically amounts to using a kind of exponential scale to highlight the interesting aspect of the data. And thus the complaint boils down again to one about the lack of a key.

But that's a complaint Mark, at least, never actually made.

I think that you can look at information presentation and judge it as good or bad. Effective or ineffective.

I think so too, but you have to judge it on criteria that are more objective than "I found this chart confusing".

By jack lecou (not verified) on 09 Apr 2009 #permalink

But note that a key doesn't provide a full explanation of the chart. For example, a scatterplot is expected to have clearly labeled axes, but it's not expected to have an explanation of the concept and assumptions behind Cartesian coordinates.

An explanation of using diameter, rather than area, is something that is not obvious and should be explained. Cartesian coordinates generally do not have this possibility to be interpreted in two different ways, but if the numbers are presented are logarithmic, that whould be expected to be noted.

I think that you can look at information presentation and judge it as good or bad. Effective or ineffective.

I think so too, but you have to judge it on criteria that are more objective than "I found this chart confusing".

Isn't the whole purpose of the chart to simplify and clarify?