You’ve probably already heard the news last week that a study published in Science indicates that the gender gap between girls and boys in mathematical performance may be melting faster than the polar ice caps. The study, “Gender Similarities Characterize Math Performance” by Janet S. Hyde et al., appears in the July 25, 2008 issue of Science (behind a paywall). 
Hyde et al. revisit results of a meta-analysis published in 1990 (J. S. Hyde, E. Fennema, S. Lamon, Psychol. Bull. 107, 139 (1990).) that found negligible gender differences in math ability in the general population but significant differences (favoring boys) in complex problem solving that appeared during the high school years. This 1990 meta-analysis drew on data collected in the 1970s and 1980s. The present study asked whether more recent data would support the same findings.
In previous decades, girls took fewer advanced math and science courses in high school than boys did, and girls’ deficit in course taking was one of the major explanations for superior male performance on standardized tests in high school. By 2000, high school girls were taking calculus at the same rate as boys, although they still lagged behind boys in the number of them taking physics. Today, women earn 48% of the undergraduate degrees in mathematics, although gender gaps in physics and engineering remain large.
The researchers used as their data results from the state assessment tests mandated by No Child Left Behind given to students in grades 2 through 11. Specifically, they drew on results from California, Connecticut, Indiana, Kentucky, Minnesota, Missouri, New Jersey, New Mexico, West Virginia, and Wyoming (since these states were able to break down the results by grade level, gender, and ethnicity), a pool of more than 7 million students.
In this data, Hyde et al. found no evidence of difference in performance on the mathematical assessments between boys and girls on average.
Effect sizes for gender differences, representing the testing of over 7 million students in state assessments, are uniformly <0.10, representing trivial differences. Of these effect sizes, 21 were positive, indicating better performance by males; 36 were negative, indicating better performance by females; and 9 were exactly 0. From this distribution of effect sizes, we calculate that the weighted mean is 0.0065, consistent with no gender difference.
Even if there is no gender difference on average, it has long been claimed that males show more variance in mathematical abilities — in other words, that more boys will be found far below and far above the average, while girls will be more tightly clustered around the average. This claim of greater variance is sometimes invoked to explain gender disparities in science and engineering careers, where presumably you need people at the very top of the range of intellectual abilities, and there just happen to be more males that fit the bill.
So Hyde et al. examined the variance in their data:
The variance ratio (VR), the ratio of the male variance to the female variance, assesses these differences. Greater male variance is indicated by VR > 1.0. All VRs, by state and grade, are >1.0 [range 1.11 to 1.21]. Thus, our analyses show greater male variability, although the discrepancy in variances is not large.
At the top of the range on the test results (in data from Minnesota 11th graders), what kind of gender differences do the data show?
For whites, the ratios of boys:girls scoring above the 95th percentile and 99th percentile are 1.45 and 2.06, respectively, and are similar to predictions from theoretical models. For Asian Americans, ratios are 1.09 and 0.91, respectively.
If these differences in the top echelons of math ability are being used to explain the gender imbalances in science, engineering, and mathematics as courses of university study or career paths, do these numbers support the explanation?
If a particular specialty required mathematical skills at the 99th percentile, and the gender ratio is 2.0, we would expect 67% men in the occupation and 33% women. Yet today, for example, Ph.D. programs in engineering average only about 15% women.
Not to mention, we would expect more Asian American women than Asian American men in these engineering programs if these educational choices simply followed ability. So at the very least, greater variance among males is not sufficient to explain the current gender breakdown in math, science, and engineering. There must be other factors at work.
Earlier studies had reported gender differences in complex problem solving, favoring boys. (They also reported gender differences in computation and grasp of concepts, favoring girls, although these differences never seemed to play much of a role in the coverage of these studies in the popular press.) It seems at least possible that some of this was connected to coursework — if you’re in a course that introduces you to complex problems and strategies for solving them, you might then be more likely to perform better in complex problem solving tasks.
Hyde et al. tried to see what their more recent data indicated about complex problem solving and gender:
[W]e coded test items from all states where tests were available, using a four-level depth of knowledge framework. Level 1 (recall) includes recall of facts and performing simple algorithms. Level 2 (skill/concept) items require students to make decisions about how to approach a problem and typically ask students to estimate or compare information. Level 3 (strategic thinking) includes complex cognitive demands that require students to reason, plan, and use evidence. Level 4 (extended thinking) items require complex reasoning over an extended period of time and require students to connect ideas within or across content areas as they develop one among alternate approaches. We computed the percentage of items at levels 3 or 4 for each state for each grade, as an index of the extent to which the test tapped complex problem-solving. The results were disappointing. For most states and most grade levels, none of the items were at levels 3 or 4. Therefore, it was impossible to determine whether there was a gender difference in performance at levels 3 and 4.
Here’s where we get into the debate about whether NCLB’s linkage of test results and school funding has been a positive or pernicious influence on test design, what our kids are learning, etc. Let me just reiterate what teachers should already know: the assessment tells the students what they really have to learn. If you think a particular competency or bit of knowledge is important but you don’t assess it, your students are smart enough not to waste their effort absorbing it.
I suppose this means that even if you’re not trying to teach to the test, to the extent that the students have reasonable information about what the test items will be like and what they will cover, the students are going to “learn to the test”.
Also, to the extent that complex problem solving is a competence that it would be good for students to develop prior to starting a college level program in mathematics, science, or engineering, if the NCLB tests are the assessments driving secondary math education, we may be in trouble. I’m hopeful that there are still old school teachers giving weekly quizzes focuses on complex problem solving, and still significant populations of kids preparing for Advanced Placement exams or math team competitions. But it sure would be nice to raise the bar to a level including complex problem solving — and figure out how to get more kids over that bar — rather than acquiescing to high stakes tests that apparently view complex problem solving as a frill.
So, in summary, when girls are taking the same math coursework as boys, the gender differences in mathematical performance seem to shrink to insignificance. And, NCLB assessments may be dumbing down math for everyone.
What impact that will have on careers in math, science, and engineering remains to be seen.
 Janet S. Hyde, Sara M. Lindberg, Marcia C. Linn, Amy B. Ellis, Caroline C. Williams, “Gender Similarities Characterize Math Performance.” Science 25 July 2008:Vol. 321. no. 5888, pp. 494 – 495.