ResearchBlogging.orgA while back, I Links Dumped Josh Rosenau’s Post Firing Bad Teachers Doesn’t Create good Teachers, arguing that rather than just firing teachers who need some improvement, schools should look at, well, helping them improve. This produced a bunch of scoffing in a place I can’t link to, basically taking the view that people are either good at what they do, or they’re not, and if they’re not, you just fire them and hire somebody else. I was too busy to respond at the time, but marked that doen as something to come back to. So I was psyched when I saw this paper in Science about a scientific trial of a teacher coaching service, which claims that:

The intervention produced substantial gains in measured student achievement in the year following its completion, equivalent to moving the average student from the 50th to the 59th percentile in achievement test scores.

“Ah-hah!” I said, “Scientific proof that teachers can, in fact, be improved with some extra instruction.” So I sat down to go through the paper for ResearchBlogging purposes. Which is when I hit a problem, because the paper is kind of awful.

The awfulness isn’t primarily on the scientific side, which is reasonably sound. They ran a controlled trial in Virigina with 78 teachers and more than 2000 students, randomly assigning teachers to the control and intervention groups. Teachers in the intervention group received coaching in making their classes more interactive, and regularly recorded themselves teaching then sent the recordings off for review. Experts at the coaching service being tested reviewed the recordings, then sent pointers to the teachers on what they could do better. They also followed up with a phone conversation.

The result wasn’t all that dramatic, but in the year after the coaching, the teachers from the intervention group did substantially better than those from the control group. They measured performance by comparing student scores on the state-mandated end-of-year test the previous year to their performance on the state-mandated end-of-year test for the class being studied. The year after the trial, the intervention group’s students improved from a raw score of 479 the previous year to a raw score of 488 for the year being studied, while the control group went from a raw score of 495 the previous year to 482 for the year being studied. This difference is statistically significant, and that’s the origin of the 50th to 59th percentile claim.

So what’s awful?

Well, for one thing, while the intervention group showed a statistically significant improvement over the control group in the year after they did the intervention, the difference during the study was pretty minimal. The intervention group in that year went from 467 to 460, while the control group went from 470 to 464. They try to shrug this off, writing:

This result lends a cautionary note to these findings. It is, however, consistent with the idea that student gains in achievement would occur only after teachers had the benefit of a year’s worth of their own growth, such that students would actually experience enhanced teacher-student interactions over a substantial portion of their academic year.

That sounds an awful lot like retconning to me.

More significantly, because the study only considered two years, there’s no way to tell whether this is just a statistical fluke. Looking at just the pre-test scores, you see a pretty big spread: from 467 and 470 in the study year to 479 and 495 in the year after the study. If you’re going to say anything sensible about the effect of the intervention, you need more of a baseline. How much variation in these achievement scores do you see from one class to another without the intervention? The standard deviations of all those average scores are in the neighborhood of 70, so I would expect to see a good deal of jumping around from one class to the next. It’s conceivable that the whole effect here is just a matter of chance– if they did the same study again the next year, they might see the results reversed.

And then there’s the fact that the paper itself is long on edu-jargon and descriptions of their programs, and short on actual, you know, data. The only data presented in the paper itself is one bar graph, with two sets of two bars (which all by itself is a data presentation method whose ridiculousness is exceeded only by the two-point scatter plot in the next paper in that issue). The useful numbers are all buried in the Supporting Online Material, where you find reasonably informative data tables. The main text actually cites tables that can only be found in the online material, which strikes me as incredibly obnoxious (though for all I know, it’s standard practice in the education research world, in which case we need to line up education researchers and slap them all).

Even worse, when you look at the data tables, you find this table of results for the trial year:


Look closely at the row for the pre-post change in scores. See the problem? This is probably just a typo, but it doesn’t really speak well for the care taken when preparing this article. Which, I remind you, was published in Science, one of the world’s most prestigious journals.

So, as much as I would like to use these results to argue that it does, in fact, make sense to offer targeted training to teachers to help them get better, there are just too many holes in this to take it seriously. Where I started thinking “Hey, cool, science supports my position!” after reading it, I was left asking “How did this get into Science?”

Allen, J., Pianta, R., Gregory, A., Mikami, A., & Lun, J. (2011). An Interaction-Based Approach to Enhancing Secondary School Instruction and Student Achievement Science, 333 (6045), 1034-1037 DOI: 10.1126/science.1207998


  1. #1 Whomever1
    August 23, 2011

    Does the article state anything about the actual content of the coaching? I’m not a subscriber to Science, so I can’t read it.

  2. #2 Chad Orzel
    August 23, 2011

    The relevant paragraphs from the paper are pretty content-free:

    This study reports results of a randomized controlled trial of a coaching program—the My Teaching Partner–Secondary program (MTP-S)—focused on improving teacher-student interactions in secondary classrooms with students aged 11 to 18 so as to enhance student motivation and achievement. The program targets the motivational and instructional qualities of teachers’ ongoing, daily interactions with students. MTP-S is conceptualized within the Teaching Through Interactions framework (fig. S1), a content-independent framework that emphasizes the extent to which student-teacher interactions influence student academic motivation, effort, and achievement (18).

    MTP-S uses the domains of the Classroom Assessment Scoring System–Secondary (CLASS-S) (19) to operationalize this framework by providing clear behavioral anchors for describing, assessing, and intervening to change critical aspects of classroom interactions. These domains focus on the extent to which interactions build a positive emotional climate and demonstrate sensitivity to student needs for autonomy, an active role in their learning, and a sense of the relevance of course content to their lives. Focus is also placed on bolstering the use of varied instructional modalities and engaging students in higher-order thinking and opportunities to apply knowledge to problems. Overall, the intervention is designed to enhance the fit between teacher-student interactions and adolescents’ developmental, intellectual, and social needs in an approach that aligns closely with elements of high-quality teaching that have been identified as central to student achievement (9).

    The MTP-S intervention integrates initial workshop-based training, an annotated video library, and a year of personalized coaching followed by a brief booster workshop. During the school year, teachers send in video recordings of class sessions in which they are delivering a lesson. Trained teacher consultants review recordings that teachers submit and select brief segments that illustrate either positive teacher interactions or areas for growth in one of the dimensions in the CLASS-S. These are posted on a private, password-protected Web site, and each teacher is asked to observe his or her behavior and student reactions and to respond to consultant prompts by noting the connection between the two. This is followed by a 20- to 30-minute phone conference in which the consultant strategizes with the teacher about ways to enhance interactions using the CLASS-S system. This cycle repeats about twice a month for the duration of the school year.

    The CLASS-S reference is:
    19. R. C. Pianta, B. K. Hamre, N. Hayes, S. Mintz, K. M. LaParo, Classroom Assessment Scoring System–Secondary (CLASS-S)

    The paragraphs in the supporting online material are nearly identical to the above, with little more detail. They’re not as convenient to quote, though.