Pure Pedantry

I was at a wedding this weekend, and I was getting in one of those conversations that drunk people get into at weddings: what are the gender differences in cognition?

OK, so maybe you don’t get into conversations like this with people you don’t know well, but I do.

Anyway, it got me thinking that I should post a summary of what is known. Much has been argued about the relevance of differences in cognition. Larry Summers lost his job over it. Ben Barres wrote a lovely editorial about it — that we all talked about at length (my stuff is here, here, and here). The Economist has an article on it — directed towards lay people — which I recommend except for one notable area. (They suggest that intelligence in men is located in the white matter while in women it is located in the gray matter. This is not only untrue, it betrays a complete ignorance of the cellular makeup of the brain and a flagrant reification of intelligence in a brain structure. Intelligence is an abstraction. It doesn’t live anywhere.)

But what are the differences between men and women?

Most of the people arguing — at least in the popular press — that there are intrinsic cognition differences base their ideas on the more recent work of Simon Baron-Cohen, a researcher at the Cambridge. Baron-Cohen is known for research into his “Extreme Male Brain” Theory of Autism. According to the theory, males have more brains that are organized to be better at systematizing, and females have brains that are organized to be better at empathizing. Individuals with autism show symptoms that suggest their brains have been pushed radically towards the systematizing over empathizing. Thus, autism is a disease of the extreme male brain. (I have always found it odd that he didn’t refer to it as “the extreme systematizing brain” and skirt this whole issue of gender, but that is an issue for another day.) A review article summarizing this hypothesis is available here.

On the other side you have a woman named Janet Hyde at the University of Wisconsin-Madison. Dr. Hyde recently published a meta-analysis of all the data related to gender difference in cognition, and she has developed an alternative hypothesis, the so-called gender similarites hypothesis. Her hypothesis is basically that this whole idea that men and women think differently is a bunch of hooey. In her own words:

The gender similarities hypothesis holds that males and females are similar on most, but not all, psychological variables. That is, men and women, as well as boys and girls, are more alike than they are different. In terms of effect sizes, the gender similarities hypothesis states that most psychological gender differences are in the close-to-zero (d < .10) or small (0.11 < d < .35) range, a few are in the moderate range (0.36 < d < 0.65), and very few are large (d 0.66-1.00) or very large (d > 1.00).

Before I show you her data you should understand a key numerical measure of difference called the Cohen’s d. The Cohen’s d is a measure of the spread of the means between two groups. For any trait that you could measure in a population there is going to be a certain spread, not everyone is exactly alike. If you wanted to compare males and females and say how alike they are and how much they overlap, what you would need is the means of the two groups and their variances. From these values you can calculate the Cohen’s d using the following formula:


MM is the mean for males. MF is the mean for females. sW is the average within-sex standard deviation.

To get an idea of what this means, click here for a table of overlaps and percent standings for a variety of Cohen’s d. Here is a description from that site of what each of those numbers means:

Cohen (1988) hesitantly defined effect sizes as “small, d = .2,” “medium, d = .5,” and “large, d = .8″, stating that “there is a certain risk in inherent in offering conventional operational definitions for those terms for use in power analysis in as diverse a field of inquiry as behavioral science” (p. 25).

Effect sizes can also be thought of as the average percentile standing of the average treated (or experimental) participant relative to the average untreated (or control) participant. An ES of 0.0 indicates that the mean of the treated group is at the 50th percentile of the untreated group. An ES of 0.8 indicates that the mean of the treated group is at the 79th percentile of the untreated group. An effect size of 1.7 indicates that the mean of the treated group is at the 95.5 percentile of the untreated group.

Effect sizes can also be interpreted in terms of the percent of nonoverlap of the treated group’s scores with those of the untreated group, see Cohen (1988, pp. 21-23) for descriptions of additional measures of nonoverlap.. An ES of 0.0 indicates that the distribution of scores for the treated group overlaps completely with the distribution of scores for the untreated group, there is 0% of nonoverlap. An ES of 0.8 indicates a nonoverlap of 47.4% in the two distributions. An ES of 1.7 indicates a nonoverlap of 75.4% in the two distributions.

Back to Hyde’s meta-analysis, here is the compiled data for all the traits between which males and females differ. In each case, the table can be expanded by clicking on it:





The tables listed above show a lot of data, so let’s talk about how we interpret that data.

One thing that the data show is that while there are some differences in between men and women for the majority of areas tested these are either not significant or not very large. In her words:

Inspection of the effect sizes shown in the rightmost column of Table 1 reveals strong evidence for the gender similarities hypothesis. These effect sizes are summarized in Table 2. Of the 128 effect sizes shown in Table 1, 4 were unclassifiable because the meta-analysis provided such a wide range for the estimate. The remaining 124 effect sizes were classified into the categories noted earlier: close-to-zero (d < 0.10), small (0.10 < d < 0.35), moderate (0.36 < d < 0.65), large (d 0.66-1.00), or very large (d > 1.00). The striking result is that 30% of the effect sizes are in the close-to-zero range, and an additional 48% are in the small range. That is, 78% of gender differences are small or close to zero. This result is similar to that of Hyde and Plant (1995), who found that 60% of effect sizes for gender differences were in the small or close-to-zero range.

That being noted there are some differences between men and women that are significant, the most notable being the physical — such as throwing distance — sexuality, and aggression (although for aggression the results are somewhat ambiguous):

The largest gender differences in Table 1 are in the domain of motor performance, particularly for measures such as throwing velocity (d = 2.18) and throwing distance (d = 1.98) (Thomas & French, 1985). These differences are particularly large after puberty, when the gender gap in muscle mass and bone size widens.

A second area in which large gender differences are found is some — but not all — measures of sexuality (Oliver & Hyde, 1993). Gender differences are strikingly large forincidences of masturbation and for attitudes about sex in a casual, uncommitted relationship. In contrast, the gender difference in reported sexual satisfaction is close to zero. Across several meta-analyses, aggression has repeatedly shown gender differences that are moderate in magnitude (Archer, 2004; Eagly & Steffen, 1986; Hyde, 1984, 1986). The gender difference in physical aggression is particularly reliable and is larger than the gender difference in verbal aggression. Much publicity has been given to gender differences in relational aggression, with girls scoring higher (e.g., Crick & Grotpeter, 1995). According to the Archer (2004) meta-analysis, indirect or relational aggression showed an effect size for gender differences of -0.45 when measured by direct observation, but it was only -0.19 for peer ratings, -0.02 for self-reports, and -0.13 for teacher reports. Therefore, the evidence is ambiguous regarding the magnitude of the gender difference in relational aggression.

Still the evidence suggests overwhelmingly that differences between men and women over a variety of areas have been exaggerated. This is especially suggestive when you look at how the disparity in some aptitudes widens over the course of development:


If we focus for a moment, on the disparity between mathematical skills, the differences between men and women do not present in childhood nor do they present in at the onset of puberty. Rather the disparity presents at the onset of high school — suggesting at least to me that the problem is one of socialization rather than biology. Hyde goes into a sizeable evidence (much too long for me to discuss here) for how socialization affects the direction of the disparities for these behaviors and aptitudes. Her conclusion is:

The conclusion is clear: The magnitude and even the direction of gender differences depends on the
context. These findings provide strong evidence against the differences model and its notions that psychological gender differences are large and stable.

Her overall conclusion is one with which I completely concur, and one that you should read and remember, lest you get in a similar discussion:

The gender similarities hypothesis stands in stark contrast to the differences model, which holds that men and women, and boys and girls, are vastly different psychologically. The gender similarities hypothesis states, instead, that males and females are alike on most–but not all–psychological variables. Extensive evidence from meta-analyses of research on gender differences supports the gender similarities hypothesis. A few notable exceptions are some motor behaviors (e.g., throwing distance) and some aspects of sexuality, which show large gender differences. Aggression shows a gender difference that is moderate inmagnitude.

It is time to consider the costs of over-inflated claims of gender differences. Arguably, they cause harm in numerous realms, including women’s opportunities in the workplace,couple conflict and communication, and analyses of self-esteem problems among adolescents. Most important, these claims are not consistent with the scientific data. (Emphasis mine.)

When we discuss issues of gender differences, we should all understand 1) that evidence exists and 2) that the evidence clearly shows that gender differences — where they exist — are not very large.

While I do not agree with Dr. Barres’s contention in his editorial (cited earlier) that the debate itself is intrinsically damaging, I do think that we should agressively argue that the belief that women are naturally psychologically inferior with respect to some behaviors is demonstrably inaccurate. Some may argue that it is liberal nicety to ignore differences between the sexes, but the advocates of liberal niceties in this case have the science on their side. It would appear in this case that it is not the liberal niceties that are the products of social convention, but rather the belief in the differences themselves.

UPDATE: Commenter ThePolynomial makes some good points that I would like to address:

1. Larry Summers argued that men and women are different in the extremes, not the mean. Sorry if you addressed this in an earlier post, but what is the evidence for and against a different standard deviation for men and women in terms of some cognitive abilities?

2. It looks like in terms of math-related abilities (spacial perception, mental rotation) there really is a pretty significant difference (~.4, ~.6) between men and women. While this might not have a huge impact on your typical precalc class, wouldn’t this affect the professor-level folk quite a bit? It’s true, if these differences are entirely due to socialization, then there’s a problem there. But I don’t think this evidence (maybe there’s more) is conclusive in that regard.

1) The issue of disparity at the tails that is not present at the means is a tough one, but in this case I do not believe it is a problem. First, in this case we are dealing with a d values that are so small that even at the edges the effect is not significant. Just to show you what I mean here is a graph of two bell curves (from the Hyde paper) with a d value = .21:


This is the level of difference that we are seeing for mathematical ability, and even at the edges the degree of overlap is quite high.

Second, the evidence suggests that these disparities are neither stable over time nor resistant to correction. Just to give you one example, from the Economist article:

Innate it may well be. That does not mean it is immutable. Spatial ability is amenable to training in both sexes. And such training works. The difference between the trained and the untrained has a d value of 0.4, and one programme to teach spatial ability improved the retention rate of women in engineering courses from 47% to 77%.

Third, there is really no evidence for large differences in the variability of abilities between men and women, particularly in mathematics. Again from the Hyde article:

One caveat should be noted, however. The foregoing discussion is implicitly based on the assumption that the variabilities in the male and female distributions are equal. Yet the greater male variability hypothesis was originally proposed more than a century ago, and it survives today (Feingold, 1992; Hedges & Friedman, 1993). In the 1800s, this hypothesis was proposed to explain why there were more male than female geniuses and, at the same time, more males among the mentally retarded. Statistically, the combination of a small average difference favoring males and a larger standard deviation for males, for some trait such as mathematics performance, could lead to a lopsided gender ratio favoring males in the upper tail of the distribution reflecting exceptional talent. The statistic used to investigate this question is the variance ratio (VR), the ratio of the male variance to the female variance. Empirical investigations of the VR have found values of 1.00-1.08 for vocabulary (Hedges & Nowell, 1995), 1.05-1.25 for mathematics performance (Hedges & Nowell), and 0.87-1.04 for self-esteem (Kling et al., 1999). Therefore, it appears that whether males or females are more variable depends on the domain under consideration. Moreover, most VR estimates are close to 1.00, indicating similar variances for males and females. Nonetheless, this issue of possible gender differences in variability merits continued investigation. (Emphasis mine.)

Taking into account that A) the d values are not large enough, B) the d values are not stable over time, and C) there are not huge differences in variability between men and women, I do not believe that we can explain this disparity in terms of a distribution tail-effect.

2) The issue of mental rotation is a special case. The evidence suggests that men are indeed better at mental rotation then women, but the ability does not generalize into a general superiority at spatial reasoning. What emerges is this idea that on average women use a different strategy than men for spatial reasoning — they tend to organize space in terms of landmarks rather than an abstract map — but that difference in strategy does not hurt their performance. Again from the Economist article:

Males also have better spatial abilities than females. If asked to imagine rotating a three-dimensional object, a skill useful in engineering, the difference is quite large (d=0.73 and 0.56 in different studies). In this case the limited evidence available suggests the difference is related to the post-birth testosterone surge in boys. Women who were exposed to high levels of testosterone in the womb do not do noticeably better in spatial-rotation tasks.

Men do not excel in all spatial tasks, though. Again contrary to popular myth, men and women are equally good at navigating. But this is another example of a task in which the sexes take different paths to the same destination. Women tend to rely on remembering landmarks, whereas men rely on their geometric skills to work out direction and distance.


  1. #1 ThePolynomial
    August 7, 2006

    Hey Jake, a few questions:

    1. Larry Summers argued that men and women are different in the extremes, not the mean. Sorry if you addressed this in an earlier post, but what is the evidence for and against a different standard deviation for men and women in terms of some cognitive abilities?

    2. It looks like in terms of math-related abilities (spacial perception, mental rotation) there really is a pretty significant difference (~.4, ~.6) between men and women. While this might not have a huge impact on your typical precalc class, wouldn’t this affect the professor-level folk quite a bit? It’s true, if these differences are entirely due to socialization, then there’s a problem there. But I don’t think this evidence (maybe there’s more) is conclusive in that regard.

    3. Could you please name your next band ‘The Cohen’s d?” Thanks.

  2. #2 Shelley Batts
    August 7, 2006

    Nice summary of the literature. I wonder what the ‘d’ would be on the task of “picking out an outfit with matching socks that doesn’t make you look like a douchebag.” Or “time spent in short cuts that invariably get you lost.”

  3. #3 farrold
    August 8, 2006

    Plotting normal distributions on a semi-log scale would yield parabolas of various heights and widths. This representation would make their behavior in the tails more intuitively understandable, and would show why the ratios of distributions with different mean and equal variance grow larger far from the center.

    To offer some numbers, out in the tail at 4 sigma (roughly the 1 in 10,000 level), with d = 2.1 (as in the graph above) the ratio is about 2 to 1, and with d = 0.5 the ratio is about 6 to 1. For more realistic, heavy-tailed distributions the comparable ratios would be smaller, but still larger than would be suggested by visual inspection of a linear-scale graph.

  4. #4 Urijah
    August 9, 2006

    I have to second farrold; that plot is not informative because the details at high sigmas are obscured. Try replotting the distributions from sigma 3 to 5 or 6. That should be much clearer. Of course, even if the d values are the _same_ (if the means are identical, say) what happens at high sigmas is quite interesting. With two normal distributions of mean 0 and variances 1 and 1.25 (the high range you noted) the ratio of the areas to the right of x=3*sqrt(1.25) is 3.391! and at x=4*sqrt(1.25) it is 8.17933!! That is certainly thought provoking (please correct me if my math is wrong–I haven’t done any statistics in quite a while.) If the variances are 1 and 1.05, the corresponding ratio of areas is 1.27862 and 1.5251, *still* quite interesting.

  5. #5 Sandeep Gautam
    August 11, 2006

    The comment from Polynomial deserves closer scrutiny. If, as indicated by Polynomial (and supposedly some prior research) the males show greater variability in distribution for some trait, say mathematics, then if d is calculated using male variance or Sigma, then though the difference in Means was large, the d may turn out to be small due to large Sigma used. Is Hyde’s statstic using sigma pooled or not can make a big difference to the outcomes.
    All said and done a really nice post that clarifies many misconceptions surrounding gender differences.