James Watson and the Myth of g

I’m not sure whether it was prompted by James Watson’s little outburst (for which he has apologized “unreservedly”) or just serendipity, but Cosma Shalizi offers an exhaustive demolition of the idea of a single general intelligence factor:

Anyone who wanders into the bleak and monotonous desert of IQ and the
nature-vs-nurture dispute eventually gets trapped in the especially arid
question of what, if anything, g, the supposed general factor of
intelligence, tells us about these matters. By calling g a
“statistical myth” before, I
made clear my conclusion, but none of my reasoning. This topic being what it
is, I hardly expect this will change anyone’s mind, but I feel
a duty to explain myself.

To summarize what follows below (“shorter sloth”, as it were), the case
for g rests on a statistical technique, factor analysis, which works
solely on correlations between tests. Factor analysis is handy for summarizing
data, but can’t tell us where the correlations came from; it always
says that there is a general factor whenever there only positive correlations.
The appearance of g is a trivial reflection of that correlation
structure. A clear example, known since 1916, shows that factor analysis can
give the appearance of a general factor when there are actually many thousands
of completely independent and equally strong causes at work.
Heritability doesn’t distinguish these alternatives either. Exploratory factor
analysis being no good at discovering causal structure, it provides no support
for the reality of g.

It’s long, and comprehensive, and involves math, so it’s not for the faint of heart. It is, however, an excellent explanation of how statistical analysis can lead smart people astray.


  1. #1 Cosma
    October 18, 2007


  2. #2 Uncle Al
    October 19, 2007

    1) Attend a Mensa RG or AG
    2) Get a copy of the Treasure Hunt
    3) Administer it to any selected cohort and compare said scores to the real thing.

    Intelligence does exit, it can be measured, it does make a difference. At Orange County (CA) Mensa Uncle Al’s teams came in first 6 years out of 9 consecutive. (“Knock it the Hell off and let somebody else win.”) Exploratory factor analysis being no good at discovering causal structure, it provides no support for the reality of g. Jews are 0.0025 of Earth’s population and 20% of Nobel Laureates. Were they unfairly aided by virulent anti-Semitism or is it merely a seven-sigma fluctuation? Say it softly… “bred to achieve.”

    You should fear.

  3. #3 John Novak
    October 19, 2007


    Uncle Al is a mensa member.

    Who knew?

  4. #4 andy.s
    October 19, 2007

    So, if you had to teach an intro physics class to a group of students with average IQ’s of 105 and a group with IQ’s around 140, you wouldn’t be able to tell the difference?

  5. #5 Jonathan Vos Post
    October 20, 2007

    “Jews are 0.0025 of Earth’s population and 20% of Nobel Laureates.”

    Assuming for the moent that the numbers are correct (I have not verified that), one would think this a stake in the heart of White Supremacists. They happily relegate those of African descent to subhuman status — ignoring that the evidence supports the contention that all humans are of African descent. But instead they assume that the Jewish Nobelist correlation prove that Jews control the media and government and science.

    The deeper problem in the public arena is not merely that intelligence measurement is riddled with contradictions, nor that “race” has no valid genetic definition. It is that, if there were genetic differences in intelligence (I am not saying that there are), this cannot be debated on the basis of evidence.

    It is not whether James Watson is right or wrong. When I saw him last month at Caltech, on his book tour, I noted that he was saying whatever he thought, regardless of whose axe was gored. He was as brutally dismissive of individuals and institutions as he was to Rosalind Franklin, who did deserve part of the Nobel prize.

    I am obviously no racist. I marched in Civil Rights demonstrations in the 1950s and 1960s. I had my Junior High give a play on the murder of Medgar Evers. My mother taught in an inner city Brooklyn elementary school. My wife taught in the all-black Government High School in Nassau, the Bahamas. I fought to teach summer school in an inner-city L.A. county high school.

    However, I have some sneaking sympathy for James Watson. Right or wrong, he is being pilloried for saying, or allgedly saying, something racist. His administrative duties are suspended at Cold Spring harbor, which he ran. And the brouhaha has no connection to evidence.

    The best cure for hateful speech is evidence-based correct speech.

    The USA has devolved into a culture where one is not free to speak against George Bush, or say something questionable about women in Math (even if President of Harvard, whose speech I deplored), or race.

    I am something of a First Amendment absolutist. I must defend James Watson’s right to say something astonishingly stupid and controversial, but urge that those who disagree speak from evidence.

    There seems to be Global Warming. But we absolutely do NOT have an ability to scientifically predict the effect of huge policy changes, as we don’t have evidence on many of the critical feedback loops.

    It’s fascinating that Green party activists now support fission reactors for energy. It is bizarre, and without evidence, that Bush and Swarzenegger believe that Hydrogen cars are a solution to an energy problem. Biodiesel cars may smell like french fries and keep Willy Nelson high and happy, but to provide fuel for all the world’s cars right now would take farming an area double that of the USA with soybeans — not exactly an alternative to Middle East and Russian oil. Can one critique Global Warming based on evidence? Not without risk to one’s career, the political deals having been cut.

    American Health Care is not evidence based. American education is not evidence based “No Child Left Behind” provides evidence only that teaching to dumbed-down standard tests gives marginal improvement in scores on the dumbed-down tests, and that there appear to be differences in the rate of improvement of “white”, Hispanic, and African-American test scores, for which there is no budget to address.

    Nobel laureates are not universal experts. We’ve had a genuinely racist Nobel laureate before (Physics). If presidency of a major university, or a Nobel prize, cuts one no slack in saying something wrong, stupid, and politically incorrect, how long before the rest of the citizens of the USA have to self-censor everything we say, write, email, and blog?

    Hello, Orwell’s 1984!

  6. #6 Bailey Hankins
    October 21, 2007

    Let’s not be disingenuous here. You can truthfully state that all dogs descended from the same pack of wolves in Asia, but that doesn’t mean that a greyhound can’t outrun a chihuahua. The populations who did not benefit from the beneficial evolutionary changes that were easily passed east and west throughout Europe and Asia are just as out of luck as Europeans who weren’t around to get the physical attributes that evolved in Africa to make a great NBA star.

    Why should intellect alone be magical in its perfect distribution among locally evolved groups? It is really inconceivable that certain populations are more intelligent as a whole than others??? Twin studies show that intelligence is very much inherited.

    Instead of being appalled that intelligence might not be “fairly” distributed, shouldn’t we be dumbstruck to discover allegedly intelligent people who believe that every single group of people who evolved on the planet are exactly equal in intellectual ability? That is what I find amazing!

    Watson didn’t say that ALL Africans are stupid. He spoke correctly. He spoke compassionately, with a true desire to help these people through genetics, to raise them up. He wasn’t being arrogant.

    Remember, you politically correct ninnies: you are the ones who are cruel in reality. Your stupidity and cowardice, your failure as scientists, are what will allow Africa to languish unnecessarily.

  7. #7 Chad Orzel
    October 21, 2007

    The claim being disputed here is not the idea of whether “intelligence” varies among human (sub)populations, but rather whether we have any means of measuring it. The argument that Cosma Shalizi and other responsible scientists are engaged in is an argument about whether the variance in IQ test scores is due to anything other than a variance in the ability to score well on IQ tests.

    In the linked article, Shalizi uses a simple model and a random number generator to show that most of the statistical arguments put forth for the validity of IQ tests are bogus. That doesn’t mean that there isn’t any variation in “intelligence” among populations, but it does mean that there isn’t any solid reason to believe that IQ tests are a meaningful measure of any such variation.

    It will take something more than methodologically flawed statistical arguments to put together a convincing argument for the importance of IQ. And, honestly, the fact that most of the people trying to make such an argument are either Mensans or creepy little racists doesn’t give me much confidence.

  8. #8 Bailey Hankins
    October 21, 2007

    Then you are in favor of dropping grades, SAT, ACT, and GRE scores from consideration in college admissions, correct? These are only tests. They correlate very highly with those completely bogus IQ tests. In fact, they are used for admissions to those dreaded, elitist, high-IQ groups like Mensa! From freshman admissions to graduate programs, colleges should select students by lottery, is this your assertion?

    I couldn’t help but notice that Shalizi is white. Couldn’t you find anyone of sub-saharan African descent to bolster you argument?

    Cosma starts off with a falsehood:

    “In contrast, the best estimate for heritability of IQ is far lower (at about 0.34) than that of height (about 0.8).”

    He is, rather stupidly, comparing the IQs of children to parents, instead of using twins:

    “Bouchard and McGue (1981) have reviewed such correlations reported in 111 original studies in the United States. The mean correlation of IQ scores between monozygotic twins was 0.86″

    His claim that height has a 0.8 correlation is nothing but an outright attempt to deceive. If he used the same parent/child correlation he used for intelligence, then a mother with a much taller son (completely common) would crash his fake height correlation. He is not comparing apples to apples. He is basically attempting to lie using statistics. Of course parents don’t correlate perfectly to their children, they have different IQs!

    Cosma Shalizi has already been torn to bits on most boards. He isn’t a geneticist, he is a physicist.

    If IQ tests are worthless, then so are college degrees. What do they really measure other than the ability to pass college classes? Nothing. Maybe we should assign MDs by lottery as well. I’ll let you go first on the triple bypass surgery under your new world order.

  9. #9 Cosma
    October 21, 2007

    1. On the specific value of the heritability of IQ, please see the previous post, where I discuss the matter in detail. (I gave a link in the post under discussion, but apparently it wasn’t prominent enough.) Guessing that I was using the correlation between parents and children was ingenious, but wrong; that correlation is not, of course, generally equal to the heritability. The figure I’m using there comes from a meta-analysis of all the available correlational studies, including, of course, identical-twin-raised-apart studies. (That meta-analysis is not mine, but one done by fully-credentialed statistical
    geneticists, if that makes it go down any easier. In passing: taking a pile of reported correlations and averaging them is not how you learn anything from meta-analysis.) There are well-known methodological problems with the twin studies, which mean that the usual “direct” estimate of the heritability as the correlation between monozygotes raised apart is just wrong.

    2. This is tangential to the actual point at, that the heritability of a construct does not indicate its reality or causal importance in any way. If you read beyond that, you’ll see the actual arguments for this, in which the numerical values are plainly irrelevant.

    3. It’s perfectly true that I’m not a geneticist. Neither are, for example, Arthur Jensen, the late Richard Herrnstein, or Charles Murray. It’s also true that I used to be a physicist, but I’m in recovery now. So what?

    4. There’s a bit of a difference between assessing what someone knows about a particular subject, and some never-very-clearly-defined “aptitude” or “potential”: the former is much easier, and much more useful, than the latter. Nonetheless, as I have said repeatedly, there is a certain role for IQ tests as stop-gaps. I have also tried to explain how they can work as such, even if g is a figment.

    5. Would things be any different if I told you I just pass as white?

  10. #10 hellblazer
    October 24, 2007

    From BH:

    If IQ tests are worthless, then so are college degrees. What do they really measure other than the ability to pass college classes? Nothing. Maybe we should assign MDs by lottery as well. I’ll let you go first on the triple bypass surgery under your new world order.

    I find the jump of logic between sentences two and four quite astounding. There’s a good case that a lot of the value of going to college (is this secondary or tertiary?) *is* to learn how to pass classes, because this is a form of mental exercise and gives some practice at self-discipline, working habit, etc. Innate intelligence? If it exists and is well-defined (which as Cosma’s essays have argued, is highly debatable), I don’t see why I should respect it more than hard-earned knowledge or laboriously acquired skills.

    Put more bluntly: I couldn’t give a toss what an MD’s IQ is. I’d be much more concerned with his or her clinical experience and any malpractice history.

    P.S. maybe this is from another thread, but what’s the deal with Mensa? was this a big rhetorical point somewhere else?

  11. #11 Bruce
    October 26, 2007

    If IQ tests are worthless, then so are college degrees.

    g is general, degrees tend not to be, they tend to be specialised towards certain professions and require certain aptitudes. But then there are those pesky arts degrees ;-)

  12. #12 tc
    October 29, 2007

    On Shalizi’s post:

    First of all, I’d agree that g doesn’t tell us much about the evolution of the mind, or how the brain gives rise to reason, etc – by definition, g is about individual differences, not about human universals. But that’s precisely why people are so (rightfully) worked up about it – what Shalizi derides as “labor market sociology” is the whole reason why most of us care about g: why some people (or groups) might be richer or poorer or more successful than others.

    Now, in his simulation, Shalizi has 11 tests, each of which draws upon from 1 to 500 shared abilities. From a psychologist’s point of view, this has no single “g factor” – but from the labor market sociologist’s point of view, who cares? What matters is that there is a set of abilities that affects _all_ the tests – we could take the average of the 500 shared abilities and label it “IQ”. A common factor is important because it means that there _is_ a single number you can use to predict all outcomes, and “multiple intelligences” and the like will not erase the predictive power of the common factor. And, if there are individual or group differences in this factor, then we should expect to see differences in outcomes.

    Finally, Shalizi doesn’t mention the sheer diversity of the types of tests that show a common factor – not just the usual academic tests, but also of musical ability, reaction time, etc. Sternberg spent a lot of time trying to come up with a test of “practical” or “emotional” intelligence that does _not_ load on g, without much success. I seem to recall that only rhythmic ability does not load on g – if there really are all these independent abilities, why hasn’t anyone come up with lots of tests that don’t load on g?

    Also, he says: The question is whether the index measures the trait the same way in the two groups. What people have gone to great lengths to establish is that IQ predicts other variables the same way for the two groups, i.e., that when you plug it into regressions you get the same coefficients. This is not the same thing, but it does have a bearing on the question of measurement bias: it provides strong reason to think it exists. As Roger Millsap and co-authors have shown in a series of papers going back to the early 1990s (e.g. this one from 1997, or this early treatment of the non-parametric case), if there really is a difference on the unobserved trait between groups, and the test has no measurement bias, then the predictive regression coefficients should, generally, be different. [15] Despite the argument being demonstrably wrong, however, people keep pointing to the lack of predictive bias as a sign that the tests have no measurement bias.

    This has been addressed. From the conclusion:

    We conclude that strict factorial invariance is tenable in comparisons of IQ test scores of blacks and whites. We base this conclusion on the finding that model A4, i.e., the least restrictive model incorporating SFI, fits reasonably well (see also Dolan, 2000). This is an important conclusion, because it implies that measurement bias, as defined by Mellenbergh (1989), is absent. Measurement bias, or content bias as Jencks and Phillips (1998) call it, is generally assumed to be absent (Jencks, 1998). It is nice to find support for this using the appropriate methodology.

  13. #13 John
    March 29, 2009

    Here is a critique which also mentions Shalizi’s comment, and why he is missing the point:

    “Jake, this is a good review and I agree with many of your major conclusions. However, your summary of the literature on g has several problems.

    [g-factor] s predicated on the notion that performance across different cognitive batteries tends to be positively correlated

    A quibble — the positive correlation between performance on different test items is not just a notion but an empirical observation that has been supported by millions of data points over the last century. More on this below.

    Psychological tests for g-factor use principal component analysis — a way of identifying different factors in data sets that involve mixtures of effects.

    Factor analysis, not PCA, is the method used by psychometricians. They are similar in principle but not in application.

    g-factor is very controversial.

    Not among intelligence researchers.

    In this review, we emphasize intelligence in the sense of reasoning and novel problem-solving ability (BOX 1). Also called FLUID INTELLIGENCE(Gf), it is related to analytical intelligence1. Intelligence in this sense is not at all controversial…

    [These authors go on to explain that in their view Gf and g are one and the same.]

    From another review:

    Here (as in later sections) much of our discussion is devoted to the dominant psychometric approach, which has not only inspired the most research and attracted the most attention (up to this time) but is by far the most widely used in practical settings.

    This was published over a decade ago. The psychometric approach has continued to attract the most research and attention and is still by far the most widely used.

    The second and broader critique of this work is whether the tests that we have for “intelligence” measures something useful in the brain.

    There’s wide agreement that the tests measure something useful about human behavior:

    In summary, intelligence test scores predict a wide range of social outcomes with varying degrees of success. Correlations are highest for school achievement, where they account for about a quarter of the variance. They are somewhat lower for job performance, and very low for negatively valued outcomes such as criminality. In general, intelligence tests measure only some of the many personal characteristics that are relevant to life in contemporary America. Those characteristics are never the only influence on outcomes, though in the case of school performance they may well be the strongest.

    A more standard criticism of g:

    while the g-based factor hierarchy is the most widely accepted current view of the structure of abilities, some theorists regard it as misleading (Ceci, 1990).
    that is:

    One view is that the general factor (g) is largely responsible for better performance on various measures40,85.A contrary view accepts the empirical,factor-analytic result, but interprets it as reflecting multiple abilities each with corresponding mechanisms141. In principle, factor analysis cannot distinguish between these two theories, whereas biological methods potentially could10,22,36. Other perspectives recognize the voluminous evidence for positive correlations between tasks and subfactors, but hold that practical, creative142 and social or emotion-related73 abilities are also essential ingredients in successful adaptation that are not assessed in typical intelligence tests. Further, estimates of individual competence, as inferred from test performance, can be influenced by remarkably subtle situational factors, the power and pervasiveness of which are typically underestimated2,136,137,143.

    The concepts of IQ and g-factor have been questioned by several authors. Stephen Jay Gould actually wrote a whole book — The Mismeasure of Man — trying to debunk the assumption that intelligence can be measured in a single number. (For a more recent and excellent critique, I recommend this article by Cosma Shalizi.) The common theme among many of these critiques is that the tests for intelligence conflate numerous separable brain processes into a single number. As a consequence, 1) you aren’t sure what you are measuring, 2) you can’t associate what you are measuring with a particular region (the output may be the result of an emergent process of several regions), and 3) you may be eliding significant differences in performance across individuals that you would recognize with a better test.

    You give too much credit to Gould and Shalizi. Their primary criticisms are entirely less reasonable than the points you make.

    The main thrusts of their arguments are that test data do not statistically support a g-factor. Gould’s argument is statistically incompetent (for a statistican’s critique see Measuring intelligence: facts and fallacies by David J. Bartholomew, 2004). Shalizi’s criticism is incredibly sophisticated, but likewise incorrect. In a nutshell, Shalizi is trying to argue around the positive correlations between test batteries. If those correlations didn’t exist, his argument would be meaningful. However, as I noted above, these intercorrelations are one of the best documented patterns in the social sciences.

    significant differences in performance across individuals that you would recognize with a better test.

    It’s possibly not well known that enormous efforts have gone into trying to make tests that have practical validity for life outcomes yet do not mostly measure g. See for example the works of Gardner and Sternberg. The current consensus is that their efforts have failed. A notable exception might be measures of personality.


    Ultimately, we need to use biological measures such as cortical volume to determine what g really is. One possible approach is to combine chronometric measurements (e.g. reaction time) with brain imaging studies. Genetically informed study designs have a role to play here too.

