Math performance in the US: boys and girls have same mean, different variance

ResearchBlogging.orgSorry for the light blogging everyone. It has been a busy, busy week.

Some of you may have caught Janet Hyde's latest paper looking at data from the No Child Left Behind Act and math performance in the US. Under the No Child Left Behind Act, states are required to test children for a variety of skills on a yearly basis. The paper looked at math performance across grade-level broken down by gender for 10 states from these tests.

Here is the key graph:

i-fbad2068214d77cf10ac2c0ae1302e70-hydegradelevel.jpg

The data includes a measure of effect size called Cohen's d (I discussed it here) and a measure called the variance ratio (VR -- which is just the ratio of the variances).

You can see from the data that the difference between boys and girls for math performance is statistically insignificant across grade-level. (A Cohen's d less than about .2-.3 is considered basically insignificant.) Hyde's work has always been important in the debate to explain the relative dearth of women in the math and the sciences. Some people -- notably former Harvard President Larry Summers -- attributed that dearth to differences in innate ability in mathematics between men and women. This data would argue against that assertion, and that interpretation of these findings in a variety of publications.

On the other hand, as many of my friends and several bloggers pointed out, this is not the end of the statistics. The observed variance between the boys and the girls is consistently different across grade level. One of Larry Summers other arguments was that the difference in participation between men and women could be attributed to this difference in variation. While at the mean ability of men and women is statistically indistinguishable, the tails of the distribution would have greater men than women. (Men would have more dullards and more geniuses due to high variance.) This upper tail effect could explain the relative participation between men and women in the sciences.

Because of this point, Alex Tabarrok at Marginal Revolution was highly critical of the coverage of this paper and suggested that the authors attempted to specifically downplay this observation:

Notice that the greater male variance is observable in the earliest data, grade 2. (In addition, higher male VRS have been noted for over a century). Now the study authors clearly wanted to downplay this finding so they wrote things like "our analyses show greater male variability, although the discrepancy in variances is not large." Which is true in some sense but the point is that small differences in variance can make for big differences in outcome at the top. The authors acknowledge this with the following:

If a particular specialty required mathematical skills at the 99th percentile, and the gender ratio is 2.0, we would expect 67% men in the occupation and 33% women. Yet today, for example, Ph.D. programs in engineering average only about 15% women.

So even by the authors' calculations you would expect twice as many men as women in engineering PhD programs due to math-ability differences alone (compare with the media reports above). But what the author's don't tell you is that the gender ratio will get larger the higher the percentile. Larry Summers in his infamous talk, was explicit about this point:

...if one is talking about physicists at a top twenty-five research university, one is not talking about people who are two standard deviations above the mean...But it's talking about people who are three and a half, four standard deviations above the mean in the one in 5,000, one in 10,000 class. Even small differences in the standard deviation will translate into very large differences in the available pool substantially out.

If you do the same type of calculation as the authors but now look at the expected gender ratio at 4 standard deviations from the mean you find a ratio of more than 3:1, i.e. just over 75 men for every 25 women should be expected at say a top-25 math or physics department on the basis of math ability alone (see the extension for details on my calculation). Now does this explain everything that is going on? I doubt it. As Summers also pointed out it takes more than ability to become a professor at Harvard and if there are variance differences in characteristics other than ability (and there are) we can easily get a even larger expected gender ratio. (Link in original.)

While I agree with Tabarrok that not talking about the variance is a significant omission in the coverage, I understand why Hyde et al. decided to downplay the point. Further, I disagree with his assertion that "higher male VRS have been noted for over a century" without caveats.

A couple problems:

  • First, it is indeed true that higher male variance has been noted in some studies comparing males and females on psychological traits. Tabarrok notes this paper. However, a broad survey of psychological differences between men and women found that the variance depends heavily on what attribute you are measuring (I discussed this here). Further, the difference in variance between men and women may be consistent in the US, but international comparisons show different VRs in different countries. The median internationally is that boys have slightly greater variability for math, but there are several countries where girls have greater variability than boys. The present study also indicated that the variance ratio for boys and girls is inverted for Asian American students -- the girls have higher variance. Thus, higher variance in boys is not always a robust finding.
  • Second, the suggestion that differences in participation are related to upper tail effects in math ability is predicated on the notion that people in math and sciences are drawn from the upper tail. This is demonstrably untrue (I discussed this here). Catherine Weinberger from that study found that individuals with bachelor's degrees in technical disciplines like science and engineering were drawn from the upper 40% of the mathematics ability distribution. This is a far-cry from the 4 SD upper tail that Tabarrok notes would be necessary for a 3:1 split between men and women. (This is approximately the difference in participation in between men and women in disciplines like physics.) The notion that there is an upper tail (and a very high upper tail too) from which professors are universally drawn is predicated on the idea that there is some quantifiable "right stuff," and that professor search committees are capable of adequately assessing that to the exclusion of women. "We know the upper tail, and you, Madam, are not in it." Yet, anyone who has looked around at their academic department notes a large variation in abilities. There is no "right stuff," or rather there are many right stuffs that are applied to lesser or greater success.
  • Third, the observation of higher variance in boys says nothing about whether this is innate or can be modified by the right environment. A recent comparison across countries found that countries with more gender equity have smaller disparities between boys and girls in math performance.
  • Fourth, I am much less suspicious of Hyde et al. in downplaying the higher variance finding than Tabarrok. Dr. Hyde is an expert in this field, and it is reasonable to expect to her to know that variance differences in these studies are a sometimes fleeting phenomena. That the authors downplayed this finding is not surprising in light of the other work in this field.

Let me conclude by saying that I agree that the coverage of this article should include some statement about the observed difference in variance. Statistical distributions are not just means. That is just intellectual honesty. However, considering the other data in this field I still do not believe that differences in upper tail effects can account for the gender difference in participation in math and sciences.

Hyde, J.S., Lindberg, S.M., Linn, M.C., Ellis, A.B., Williams, C.C. (2008). DIVERSITY: Gender Similarities Characterize Math Performance. Science, 321(5888), 494-495. DOI: 10.1126/science.1160364

Categories

More like this

Pseudoscience is effective. If it weren't, people wouldn't generate so much of it to try to justify opinions not supported by the bulk of the evidence. It's effective because people trust science as a method of understanding the world, and ideological actors want that trust conferred to their…
Yesterday's post on the speed of Jamaican sprinters, and Genetic Future's skepticism of a one-gene answer for their dominance. The discussion brought up some adaptive talk; I'm not against adaptation, and I think it's entirely plausible that populations differ enough in the distribution of…
The LA Times has taken upon itself to rate school teachers in Los Angeles. To do this, the LA Times has adopted the 'value-added' approach (italics mine): Value-added analysis offers a rigorous approach. In essence, a student's past performance on tests is used to project his or her future results…
I continue to struggle to avoid saying anything more about the Hugo mess, so let's turn instead to something totally non-controversial: gender bias in academic hiring. Specifically, this new study in the Proceedings of the National Academy of Science titled "National hiring experiments reveal 2:1…

Good points on a fascinating paper.

I've taught Math to several thousand students, on and off since 1973, in college, University, middle school, and high school. In between, I am a professional Mathematician, publishing in Mathematical Physics, Mathematical Biology, Mathematical Economics. My wife is a Physics professor.

The Diversity debate is not mere ideology. I see important differences in Dyscalculia between young women and young men, but see no compelling statistical analysis.

Good post. One quibble is that you say that the mean effect sizes are statistically insignificant. I think that's not true. Since the SEM of d is much smaller than the d, that means that those differences are *statistically* significant. This is simply because they tested a very large sample, so any different at all would show up. On the other hand, you're quite right that having d that small is *practically* insignificant. I would say, instead of what you said, that there is a very very small difference in average performance, and that it's not consistent which gender does better.

I wish that they had made histograms available with separate lines for boys and girls. They had so much data that it would have told a very clear story...

I think your asking the wrong question, it's not what females lack, but what males lack.

I am not sure you will ever find a brain functional difference in Math ability to explain this.

I do not think female's are lacking any thing. It's more of what the men are lacking that drives them into engineering, math and the sciences.

For those male's that lack some social skills but still have a need for strong competition this would drive many of them in to these fields because they provide a ridge logic construct for competition and social interaction.

It's a social crutch in a sense that is more suited to adolescent male's.

Any reasonable person with social skills would never spend so much solitary time alone in front of books or computers and other machinery to become so proficient.

This is how the males are obsessively driven into that 99 percentile.

I have noticed this bias in computers programming and its clear that anti-social males can spend 16+ hr per day on the computer. Some part of this is avoidance of unstructured social interaction that over time becomes almost obsessive compulsive because all social and emotional needs are attempted to be satisfied with technical achievements instead.

This starts a young age around puberty, and I think this is where your answers will be had and not looking a just math abilities.

"Any reasonable person with social skills would never spend so much solitary time alone in front of books..."

Like that loser Abe Lincoln, reading the Classics by candle-light? Nobody like that could make a difference in the world. Or that playboy fiddler Al Einstein?

It's anecdotal evidence being swept aside by statistical methodology. We all have hunches. Science is different.

Larry Summers never suggested there were differences in the mean. He talked exclusively about differences in the high tail. You can read his comments

To respond to your points:
1. There's always lots of noise in non-experimental data. This observation is beside the point.

2. Summers was talking about underrepresentation in elite math heavy departments. The underrepresentation (if it exists) in other departments is beside the point.

3. Your post on gender equality and math ability was about differences in means. Beside the point.

4. Who cares how qualified the researcher is? Shouldn't we let the data do the talking?

Assuming she did a good job disentangling the signal from the noise, then her data are saying there's significant differences between males and females in the tails of the distribution of ability. This creates a very large difference in the number of qualified men and women for elite technical positions.

I've yet to see a serious attempt at accounting for how much underrepresentation is explained by this large difference in ability. My back of the envelope calculations suggest differences in ability should lead to a 3 to 1 disparity in top math departments. (See here: http://www.ambrosini.us/wordpress/2008/07/women-avert-your-eyes-fun-wit…) We see 5 to 1 disparities so there's room for other explanations.

One very general point should be made: finding a new generation of scientists is one thing, while raising a scientifically literate population is another. Gender inequality in the Mathematics Department is one problem, and widespread innumeracy in the citizen body is another. We care about having a well-educated electorate, and therefore, we care about the mean.

The present study also indicated that the variance ratio for boys and girls is inverted for Asian American students -- the girls have higher variance. Thus, higher variance in boys is not always a robust finding.

We should start a pool on how long it will take an evolutionary psychologist to invent an "adaptation" story for this.

Thanks for this careful discussion.

Some readers might be interested in the history of the "variability hypothesis"--the hypothesis that males of a given species are "more variable" than females. This dates back to Darwin.

The Wikipedia article on the variability hypothesis seems reasonably accurate (however, I make no claims to be an expert on this subject). A more thorough treatment (available via JSTOR) is:

Stephanie Shields. (1982). The variability hypothesis: The history of a biological model of sex differences in intelligence. Signs, 7(4), 769-797.

Conjectures about greater male variability used to include physical attributes as well as mental attributes, but this seems no longer to be the case. This may be due to the work of Leta Hollingworth in the early part of the 20th century. According to Margaret Rossiter's Women Scientists in America, Hollingworth was skeptical about the variability hypothesis and conducted several studies of physical and mental attributes whose results were published between 1913 and 1916. In one of these, for example, she measured length and weight of neonates (whose physical measurements were presumably not affected by differences in societal attitudes about gender) and found no differences in variability between the genders.