My talk at the AAAS meeting was part of a symposium on the results from the 2008 Trends in International Math and Science Survey (TIMSS) Advanced. This is an international test on math and physics given to high-school students in nine different countries (Armenia, Iran, Italy, Lebanon, Netherlands, Norway, Russia, Slovenia, Sweden), and this is part of an ongoing survey, with a previous round given in 1995 or 1998. As part of the preparation for the talk, I got all the released items from TIMSS 2008, including score breakdowns and demographic information. My own analysis of this was fairly rudimentary– I was asked to comment on the content of the physics items, not the scores themselves– but it’s always fun to play with numbers. And it was really interesting to hear from the other speakers about the international results.
Alka Arora from the International Study Center gave an overview, which was a little depressing. The results were, almost across the board, worse than the earlier test. Both math and physics scores were down for all countries since 1998, though the drop wasn’t statistically significant for Russia and Slovenia. Norway’s scores dropped by almost three-quarters of a standard deviation, though, and Sweden was similar. My sketchy notes don’t mention any country seeing an increase in scores.
So, what’s the problem?
Liv Sissel Gronmo gave a presentation about the Norwegian experience, and also provided English translations of the report she helped prepare on these results, and her answer at least was that it’s a breakdown in basic abstract reasoning skills. Looking at the items that were the same as the earlier tests, the biggest drops came on problems requiring abstract thinking. The percentage of students correctly answering a straightforward derivative question fell from 40 to 22, and those able to identify a continuous function dropped from 56 to 35 percent. The drops were less dramatic for questions that were more applied in nature– questions where students could obtain an answer by direct calculation, and that sort of thing.
Gronmo attributed this to a shift in teaching away from skill-based instruction and drill to a more conceptual orientation. This view was also shared by the expert asked to assess the mathematics items, Richard Askey of the University of Wisconsin. He blasted a number of the items for being poorly designed for assessing abstract mathematical knowledge, and also complained that those that did ask reasonable questions mostly gave depressing results. I didn’t agree with everything he had to say– he really pounded on a couple of multiple-choice items that would allow you to determine the correct answer by a process of elimination, but I tend to think that you need to have a pretty good understanding to be comfortable with that sort of approach– but he did have some interesting ideas on ways to re-design this sort of test.
The other not-me speaker, Barbara Japelj Pavesic, talked about the results from Slovenia, which despite being a small country, has an impressive 40% of their students taking advanced mathematics, and scored reasonably well on both sections of the test. Her talk was mainly about digging into the data they have about student scores and demographics, to statistically identify the characteristics of high-scoring students. I was starting to fret about my own talk at that point, but it sounded like an interesting process, even if the results were not all that surprising (the highest scoring cohort basically mapped to geeky male students, from what I could tell).
As for my own rudimentary analysis, you can get most of it from my slides. A couple of things that struck me as odd and possibly interesting (though the sample of questions I had to look at was so small that these are as likely to be a complete fluke as anything else:
— While the international average scores in all categories were higher for girls than boys, Lebanon had the average difference between male and females favor the girls in all four content areas, and Armenia had girls outscore boys in three of the four areas. If you gave me that list of countries, I wouldn’t’ve picked those two as the ones to have this distinction, but good for them.
— While the overall scores for multiple choice questions were higher than the overall scores for free-response questions, the gender gap was smaller. That is, the girls did relatively better on free-response questions than multiple choice. That’s true even if you control for the lower overall scores (that is, it’s not just that the absolute difference in scores is smaller (which might happen because both groups scored worse on free-response problems), but the gap as a fraction of the score for that category is smaller). I have no idea what this means– it might very well be a complete fluke– but I don’t doubt that an enterprising soul could spin a story about how this is a direct result of conditions on the African savanna back in the day.
Anyway, it was an interesting session and an interesting overall experience. It’s also a little difficult to reconcile with some of the things said by a speaker in one of the Sunday sessions, but that will have to wait for another post.