In which the skewing of a data plot in Ron Unz’s epic investigation of college admissions makes me more skeptical of his overall claim, thanks to the misleading tricks employed.
Steve Hsu has a new post on a favorite topic of his, bias against Asians in higher ed admissions. This is based on a giant article by Ron Unz that I don’t have time to read, and illustrated with the graphic that’s the “featured image” for this post (which I will also reproduce below for the convenience of RSS readers).
What does this show? There’s a tangle of colored lines representing the fraction of Asian-American students at the various Ivy League schools over the period from 1990 to 2011, which are mostly clustered in the 15-20% range. There’s also a maroon-ish line representing the fraction of Asian students at Caltech, which climbs from a bit over 20% to just under 40% in the same period.
The argument, here, is that this is damning evidence of quotas for Asian students at elite universities, something Hsu has been talking about for a long time. This is supposed to be clinched by the dashed black line, representing the number of Asians in the general population.
I’m somewhat sympathetic to claims that Asians have a difficult position in higher education, but I hate this graphic as a way of trying to demonstrate it. If you look at it quickly, it seems convincing, but it’s actually doing a couple of sleazy things to over-sell its point, in a way that ultimately makes me less likely to accept the argument it’s supposed to support.
First and foremost, this is comparing apples to potatoes. That dashed black line showing the growth of the Asian population is actually the absolute number of Asian-Americans in the college-age demographic. which means this is a double-y-axis plot, one of the most annoying of all data graphs– I inevitably get turned around as to what data go with what axis, and end up taking twice as long to get the point as I need to. In this case, the dashed line and only the dashed line use the right-hand vertical axis scale, showing the number of Asian-Americans in thousands, while everything else uses the left-hand scale, which is the percentage of Asian-American students at the various universities.
What’s wrong with this? Well, it’s not a fair comparison– you’re comparing a percentage to an absolute number. If the total population of the country were somehow constant, that might be a fair thing to do, but that’s not remotely the case– the population of the US has increased substantially over the last 20 years, from 249 million in 1990 to 309 million in 2010, according to Wikipedia, which is close enough for these purposes. The absolute number of Asian-Americans of college age has gone up, sure, but so has the absolute number of everyone. A fair comparison would need to look at Asian-Americans as a fraction of the population, not their absolute numbers.
Why the incorrect comparison? Probably because it makes for a more superficially convincing graph. Using absolute numbers lets whoever made the graph (Unz or somebody at The American Conservative) adjust the scale so it closely follows the Caltech line– and even goes busting out of the frame of the graph for the last data point, because the Asian-American population is exploding!– while plotting it as a percentage on the same scale as everything else almost certainly wouldn’t produce as close a match. If you scale the Asian-American numbers by the general population increase, the change is more like a 50% increase than the factor-of-two change in the absolute numbers. That’s probably too simplistic, because the Asian-American population might well be increasing faster than other groups, but it gives the general idea.
Along the same lines, plotting the Asian-American students as a percentage of the total on the same scale as everything else would reveal another thing: they’re vastly over-represented relative to their share of the overall population. Asian-Americans are something in the neighborhood of 5% of the total population. The college-age number will be a little different, but not by much– Unz’s epic article has a second graph showing the college-age percentages, and while it’s hard to read, Asian-Americans aren’t even 10%. So, putting the Asian-American fraction on the same axis would mean adding a small line way at the bottom, which would blunt the effect of the graph. And a figure like this is as much an emotional appeal as a rational argument, so drawing visual attention to the fact that even the supposedly quota-limited Ivy League schools enroll Asian-American students at nearly four times their demographic share of the population.
(I’m not saying that Unz or Hsu are trying to conceal the demographic data, here– on the contrary, they’re both very forthright about the fact that even with the supposed quotas, Asian-Americans are overrepresented at elite universities. For conservatives (Unz is, after all, writing in The American Conservative), that’s actually a feature, not a bug– the argument is that Asians deserve to dominate higher education, either due to cultural factors (the tack Unz seems to be taking with his analogies to Jewish quotas in the 20th century) or inherited intelligence (a less common approach, as it easily slips into Creepy Charles Murray territory) that make Asians as a group better qualified than the African-American and Hispanic students who are underrepresented relative to their share of the total population, but get affirmative action preferences that Asian-Americans do not. That’s something spelled out in the text, though, and we’re talking about the graph, which is making a different kind of appeal.)
So, graphing everything as percentages is out. How about normalization, then? That’s a relatively honest approach to comparing unlike things– scale everything so the 1990 data have a value of 1, say, and show the growth since then. That runs into a different problem that’s concealed by the confusing presentation– the fraction of Asian-American students has substantially increased at Dartmouth and Princeton, and not changed much at all for what I think are Penn and Columbia (the muddy Excel color scheme makes it really difficult to trace specific schools through that spaghetti tangle of lines). It’s only Yale, Harvard, and Cornell that show a noticeable decrease in the Asian-American fraction over the period being graphed.
So, a normalized plot would show three lines going down, two going up dramatically (by about as much as the Caltech line), and three not doing much of anything. Again, that would blunt the emotional appeal of the graph. It’s better, for the purposes of the argument being made, to plot them all in a big tangle, which makes it clear that the top line has come down– look! quotas!– but makes it a little harder to see that the bottom line has come up. Conveniently, the two schools that have showed the big increases are also the two smallest Ivies (Dartmouth and Princeton) while two of the biggest (Harvard and Cornell) have come down, so the overall average still shows a decrease, when those numbers are quoted.
Looking closely at this graph, then, gives a somewhat different impression than the first impression it is designed to create. It actually works somewhat against the overall point, because looking at the slightly sleazy way the data are presented graphically makes me more skeptical about the overall argument. If the data were really clear and damning, they wouldn’t need to resort to How to Lie With Statistics graphical chicanery, would they? It might be that the numbers actually back up the story completely, when you go into all the details, but skewing the graphical presentation this way makes me more skeptical than I would otherwise.
In the grand scheme of misleading presentation of dubious data, of course, this isn’t all that bad. It’s not in the same league as Fox News for example– hell, it’s barely even the same sport. This is fairly subtle stuff, subtle enough that it’s even somewhat plausible that whoever put the graphic together didn’t consciously realize they were being deceptive. But the overall effect, on close inspection, is to raise more questions than there are answers provided.