A while back, I Links Dumped Josh Rosenau’s Post Firing Bad Teachers Doesn’t Create good Teachers, arguing that rather than just firing teachers who need some improvement, schools should look at, well, helping them improve. This produced a bunch of scoffing in a place I can’t link to, basically taking the view that people are either good at what they do, or they’re not, and if they’re not, you just fire them and hire somebody else. I was too busy to respond at the time, but marked that doen as something to come back to. So I was psyched when I saw this paper in Science about a scientific trial of a teacher coaching service, which claims that:
The intervention produced substantial gains in measured student achievement in the year following its completion, equivalent to moving the average student from the 50th to the 59th percentile in achievement test scores.
“Ah-hah!” I said, “Scientific proof that teachers can, in fact, be improved with some extra instruction.” So I sat down to go through the paper for ResearchBlogging purposes. Which is when I hit a problem, because the paper is kind of awful.
The awfulness isn’t primarily on the scientific side, which is reasonably sound. They ran a controlled trial in Virigina with 78 teachers and more than 2000 students, randomly assigning teachers to the control and intervention groups. Teachers in the intervention group received coaching in making their classes more interactive, and regularly recorded themselves teaching then sent the recordings off for review. Experts at the coaching service being tested reviewed the recordings, then sent pointers to the teachers on what they could do better. They also followed up with a phone conversation.
The result wasn’t all that dramatic, but in the year after the coaching, the teachers from the intervention group did substantially better than those from the control group. They measured performance by comparing student scores on the state-mandated end-of-year test the previous year to their performance on the state-mandated end-of-year test for the class being studied. The year after the trial, the intervention group’s students improved from a raw score of 479 the previous year to a raw score of 488 for the year being studied, while the control group went from a raw score of 495 the previous year to 482 for the year being studied. This difference is statistically significant, and that’s the origin of the 50th to 59th percentile claim.
So what’s awful?
Well, for one thing, while the intervention group showed a statistically significant improvement over the control group in the year after they did the intervention, the difference during the study was pretty minimal. The intervention group in that year went from 467 to 460, while the control group went from 470 to 464. They try to shrug this off, writing:
This result lends a cautionary note to these findings. It is, however, consistent with the idea that student gains in achievement would occur only after teachers had the benefit of a year’s worth of their own growth, such that students would actually experience enhanced teacher-student interactions over a substantial portion of their academic year.
That sounds an awful lot like retconning to me.
More significantly, because the study only considered two years, there’s no way to tell whether this is just a statistical fluke. Looking at just the pre-test scores, you see a pretty big spread: from 467 and 470 in the study year to 479 and 495 in the year after the study. If you’re going to say anything sensible about the effect of the intervention, you need more of a baseline. How much variation in these achievement scores do you see from one class to another without the intervention? The standard deviations of all those average scores are in the neighborhood of 70, so I would expect to see a good deal of jumping around from one class to the next. It’s conceivable that the whole effect here is just a matter of chance– if they did the same study again the next year, they might see the results reversed.
And then there’s the fact that the paper itself is long on edu-jargon and descriptions of their programs, and short on actual, you know, data. The only data presented in the paper itself is one bar graph, with two sets of two bars (which all by itself is a data presentation method whose ridiculousness is exceeded only by the two-point scatter plot in the next paper in that issue). The useful numbers are all buried in the Supporting Online Material, where you find reasonably informative data tables. The main text actually cites tables that can only be found in the online material, which strikes me as incredibly obnoxious (though for all I know, it’s standard practice in the education research world, in which case we need to line up education researchers and slap them all).
Even worse, when you look at the data tables, you find this table of results for the trial year:
Look closely at the row for the pre-post change in scores. See the problem? This is probably just a typo, but it doesn’t really speak well for the care taken when preparing this article. Which, I remind you, was published in Science, one of the world’s most prestigious journals.
So, as much as I would like to use these results to argue that it does, in fact, make sense to offer targeted training to teachers to help them get better, there are just too many holes in this to take it seriously. Where I started thinking “Hey, cool, science supports my position!” after reading it, I was left asking “How did this get into Science?”
Allen, J., Pianta, R., Gregory, A., Mikami, A., & Lun, J. (2011). An Interaction-Based Approach to Enhancing Secondary School Instruction and Student Achievement Science, 333 (6045), 1034-1037 DOI: 10.1126/science.1207998