As I believe I've said before, if anything good has come from the Larry Summers debacle of a few years ago, it's that it inspired some really interesting research on gender differences in math. If you've been reading this blog for a while, you've probably guessed that one of my favorite topics in that research is stereotype threat. Stereotype threat is, according to Claude Steele(1, "the threat of being viewed through the lens of a negative stereotype or the fear of doing something that would inadvertently confirm that stereotype," and there's now a pretty substantial literature showing that, in the lab at least, people's performance can be seriously affected by simply making them aware of a relevant stereotype. But the goal of stereotype threat research is not to screw with people in the lab, of course, it's to understand and perhaps alleviate real-world problems like test score gaps between various groups. So researchers have recently begun to take stereotype threat studies into the wild.
The first study I know of on stereotype threat in the real-world was worrying. In the lab, making people aware of their race or gender negatively affects their performance on race and gender-stereotyped tasks(2). This is one of the classic stereotype threat findings, and seemed a prime candidate to test in the wild. However, when Stricker and Ward used a related manipulation, moving gender and race questions to the end of a test, designed to reduce stereotype threat, they found no "statistically or practically significant" differences in performance for any target group on the AP Calculus exam(3). This is somewhat disturbing, but, when Danaher and Crandall reanalyzed Sticker and Ward's data, they found that there was in fact a rather large and statistically significant effect of the manipulation for gender at least(4); so large, in fact, that simply by moving the gender question to the end of the test, you'd expect 4700 more female students to receive AP calculus credit every year(5). In their discussion of this finding, Good et al. suggest that the politically-charged atmosphere surrounding gender differences in mathematics may have been a factor in Sticker and Ward's reporting no significant differences when in fact rather large differences existed.
In another impressive and, fortunately for all of us, clearer example of counteracting stereotype threat in the wild comes from a 2006 paper by Cohen et al.(6). They had African American and European American participants write essays at the beginning of the school year, with half of them writing on their most important values (the self-affirmation condition) or on their least important values (control condition). In two studies, over two different years, African American students in the affirmation condition showed improved performance (around 1/3 of a grade point on a 4-point grade scale), and reduce the gap between their performance and that of European American students by about 40%. So, just by making them feel better about themselves once, at the beginning of the academic year, you can improve their performance for the entire academic year. That's impressive.
Finally, in the most recent example, Good et al. had advanced college calculus students take a practice exam that they were told would test their readiness for the upcoming real exam, and would also get them extra credit based on their score. Participants were randomly assigned to one of two conditions: the reduced-threat condition, in which they were told that the exam had been thoroughly tested, and had shown no gender differences, and the control condition, in which gender wasn't mentioned (gender stereotypes in math are pervasive, so it's likely simply taking a test will activate them). Male participants performed equally well in both conditions, while female participants performed significantly worse in the control condition than in the reduced-threat condition. In fact, while female participants in the control condition performed worse than male participants, the female participants in the reduced-threat condition performed better than all of the male participants. Once again, then, a simple manipulation significantly was able to reduce the effect of stereotype threat on performance.
What I like about these studies is that the manipulations they involve are really easy to administer in any class or at the beginning of any test. In fact, though more research needs to be done, I see absolutely no reason for educators not to begin moving gender and race questions to the end of tests, and to include passages in test instructions that suggest that there are no gender/race differences in performance on the test. They would have nothing to lose, and as these studies suggest, there may be much to gain from taking these simple measures to reduce racial and gender gaps in performance.
1Steele, C.M. (1999). Thin ice: "Stereotype threat" and Black college students. Atlantic Monthly, August, 44-54.
2Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69, 797-811.
3Stricker, L. J., &Ward, W. C. (2004). Stereotype threat, inquiring about test takers' ethnicity and gender, and standardized test performance. Journal of Applied Social Psychology, 34, 665â693.
4Danaher, K., & Crandall, C.S. (In Press). Stereotype threat in applied settings re-examined. Journal of Applied Social Psychology.
5Good, C., Aronson, J., & Harder, J.A. (In Press). Problems in the pipeline: Stereotype threat and women's achievement in high-level math courses. Journal of Applied Developmental Psychology.
6Cohen, G., Garcia, J., & Master, A. (2006). Reducing the racial achievement gap: A social-psychological intervention. Science, 313, 1307â1310.
- Log in to post comments
Is there a draft of the Danaher & Crandall paper available? It would be interesting to see how they derived such different results from Stricker & Ward from the same data set.
Excellent post. I agree that we might as well implement these interventions NOW.
Dan, you'll have to email Crandall for it (Danaher is, I believe, on a leave of absence right now).
However, when Stricker and Ward used a related manipulation, moving gender and race questions to the end of a test, designed to reduce stereotype threat, they found no "statistically or practically significant" differences in performance for any target group on the AP Calculus exam(3). This is somewhat disturbing, but, when Danaher and Crandall reanalyzed Sticker and Ward's data, they found that there was in fact a rather large and statistically significant effect of the manipulation for gender at least(4); so large, in fact, that simply by moving the gender question to the end of the test, you'd expect 4700 more female students to receive AP calculus credit every year(5).
I'm finding this bit confusing. You seem to be saying that Stricker and Ward found that they could eliminate stereotype threat by moving trait questions to the end of the test (i.e. no differences for target groups == no stereotype threat). But then you say that others re-analyzed their data and found that...they could eliminate stereotype threat by moving trait questions to the end of the test.
In other words, the structure of your writing implies a contrast, but the content implies the same finding by two different groups of researchers. Can you clarify?
Outlier, sorry that's confusing: they found no difference between the females/African Americans who got the gender questions prior to the test and those who got it after the test. It was the re-analysis that found such a difference for females.
As a physics professor (definitely a field where white males are overrepresented), I'm very glad to read about these studies. I will try to implement them in the future, and read my exams much more carefully to analyze for bias.
(BTW, I followed the link from Pandagon.)
Here are 5 very brief descriptions of stereotype threat experiments: http://tinyurl.com/3x692f
You might find this site of interest:
www.reducingstereotypethreat.org
There, we provide a review of the literature on stereotype threat and review the difference between the Stricker & Ward and Danaher & Crandall approaches.