We have had an ongoing discussion on this blog about whether the disparity between women and men in the sciences is the result of a innate difference in cognitive ability or the result of some social phenomena such as selective participation or discrimination. Unfortunately, one of the complexities of this debate is that there is really no good objective standard for how good a scientist is. You can look at publication rates and journal impact, but comparing these numbers across fields is difficult. We lack objective measures.
It would be interesting to look at an analogous system to science — something that requires lots of spatial and mathematical skill — but has objective measures. This system should also have a male:female disparity. Looking at this system we might be able to better understand why there are fewer women and apply this knowledge to science as an occupation.
With this in mind, Chabris and Glickman, publishing in the latest issue of the journal Psychological Science, have done a huge retrospective study using data from the 13 years of matches and players in the US Chess Federation.
The US Chess Federation has a ranking system whereby players are followed throughout their playing lives. This allows us to monitor how well boys versus girls are doing at their earliest years, how many of them stay involved or leave, and how many of them become grandmasters. Furthermore, the disparity issue is larger than in science — making this data set very interesting. Of the 894 Chess grandmasters in 2004, only 8 of them are women.
Before I talk about their data, Chabris and Glickman summarize very nicely the explanations that could be presented for the disparity between men and women in chess performance:
- First, there could be some innate diffence in ability between men and women overall with respect to the skill required to play chess well. This difference in average or in variability need not be large; at the upper tail of the distribution where chess players operate for say spatial ability, a small difference would result in a large difference in representation. They call this the ability distribution hypothesis.
- Second, discrimination could result in a difference in participation through different standards. However, they not that this is not a problem for this particular study because Chess rankings are objective measures. You can’t discriminate against someone when their gender cannot be calculated into their performance.
- Third, there could be a differential drop-out rate between boys and girls. Equal numbers of boys and girls with equal abilities could begin chess training, but fewer girls could see it through to becoming chess grandmasters. They call this the differential dropout hypothesis.
- Finally, fewer women could self-select to participate in chess. If fewer talented women choose to participate in chess in the first place, by attrition alone there will be fewer in the resulting grandmaster pool. They call this participation rate hypothesis:
Anyone who visits an open chess tournament will be struck less by the lack of women at the top of the results table than by their near absence at all levels. Only 9.7% of all USCF-rated games in 2004 were played by women. It is possible that the lack of women at the top is an artifact of their lower overall participation rate (Charness & Gerchak, 1996): Even if men and women have the same underlying ability distribution, a larger number of top-rated players will be men if the overall number of men competing is greater (the participation-rate hypothesis). That is, if fewer women than men even begin to participate in organized competition, dropout rates (and cognitive endowments) could be equal, but women would still be relatively absent at the top.
The study examined all chess players that were active from 1992 to 2004 — looking at age, sex, zip code, and rankings. More information about the USCF ratings system can be found here. They describe the ratings score as follows:
A player’s USCF rating is an estimate of his or her current playing strength on a scale that ranges generally from 100 to 3000; higher ratings are associated with better playing ability…Average tournament players are usually rated between 1400 and 1600, chess masters are rated above 2200, and world-class players tend to be rated above 2500. USCF ratings are essentially estimates of merit parameters from Bradley and Terry’s (1952) model for paired comparisons, calculated using an approximately Bayesian filtering algorithm to update ratings over time (Glickman, 1999).
After examining the data the researchers made four statements summarized below:
- They found that men and women differed in chess ability in all age groups even after differences like frequency of play (read: level of training) or age were taken into account. The disparity between men and women in ability exists at the beginning and persists across all age groups. At least ostensibly this would lend credence to the ability distribution hypothesis in the sense that it suggests the mean ability between men and women are innately different. The last piece of data looks at whether that is true.
- They found no greater variance in men than women. It had been suggested that since science selects for individuals at the upper tail of the distribution, a higher variance in men than women might explain their greater representation. However, the researchers found that — with respect to chess — if anything in most age groups women had a higher variance then men. Upper tail effects do not explain the differences in the numbers of grandmasters.
- They found that women and men do not drop out more or less frequently when ability and age are factored out. For example, if you are not very good at chess you are more likely to stop playing tournaments, but girls and boys that are equally good are equally likely to stop playing. This strikes a blow at the differential dropout hypothesis.
- Finally, here is the interesting part. If you look at the participation rate of women and relate that to performance, you find that in cases where the participation rate of women and men is equal the disparity in ability vanishes. Basically, this means that in zip codes where there are equal numbers of men and women players there is no great disparity between male and female ability — and certainly not a disparity in ability large enough to explain the difference in the numbers of grandmasters. In their words:
Finally, we addressed the participation-rate hypothesis. If in the general population the number of boys who play chess is substantially larger than the number of girls, the best ones ultimately becoming USCF members and playing competitively, then it follows statistically that the average boys’ ratings will be higher than the average girls’ ratings (among competitive players) even if the distribution of abilities in the general population is the same (Charness & Gerchak, 1996; Glickman & Chabris, 1996). In fact, far fewer girls than boys enter competitive chess, which suggests that the general population of chess-playing girls is much smaller than that of boys. External factors like the relative lack of female role models among the world’s top players and the prospect of playing a game dominated by boys may be discouraging to girls (or their parents), either directly reducing their likelihood of learning how to play in the first place or indirectly reducing their initial performance in competitive play via test anxiety or stereotype threat (Steele, 1997). Thus, it is possible that, on average, girls have the chess-relevant cognitive abilities, but the larger number of boys playing chess leads to significantly higher male ratings in the USCF population.
Boys generally had higher ratings than girls, particularly in the male-dominated ZIP codes. However, in the four ZIP codes with at least 50% girls (areas in Oakland, CA; Bakersfield, CA; Lexington, KY; and Pierre, SD), boys did not have higher ratings. In Oakland, with the greatest proportion (68%) of girls in the sample, the average rating of girls was higher than that of boys, though not significantly so. Combining all ZIP-code areas where the proportion of girls was at least 50%, the sex difference was only 35.2 points in favor of males, which was not significant (p = .59). The same result was obtained in an age-adjusted analysis, which yielded a sex difference of 40.8 points (p = .53).
The fairly constant mean male advantage until the 50% female participation rate was reached suggests a threshold effect: Factors limiting girls’ performance levels may depend on their being in the minority, but not on the relative size of the male majority (in other words, 50% girls may constitute a “critical mass”).
Making sense of this data
I am going to make an analogy to make this data make more sense. Why does it seem like the US has substantially fewer good soccer players than the rest of the world? We clearly have good athletes. We play other sports well. We train athletes just as well. Why do other countries do so much better?
The answer is that when you are a good athlete in the US, you do not play soccer. You end up playing something else like football or basketball. The difference in performance is related to a difference in participation.
This data strongly argues that the difference in performance of women in chess is also a problem of participation. The problem is not that women can’t play chess well. The problem is that enough women who play chess well are not choosing to play chess. There may be several reasons socially why they choose not to do so or are discouraged from doing so — I will let you speculate about that at your leisure. However, this data strongly supports the participation rate hypothesis.
We could apply this data to our experience in science. There were — I think — 4 women in my graduating class at Stanford who majored in Computer Science along side 100 or so men. The problem is not that there are no women who could be Computer Science majors. (The women I met at Stanford were certainly gifted enough.) The problem was that for whatever reason they either didn’t want to or weren’t encouraged to participate in that major.
The finding that there is a critical mass of participation is also interesting. I think it will certainly inform the debate to know that at least with respect to this system, if you can get the participation up to 50% you can solve the performance problem.