The LA Times has taken upon itself to rate school teachers in Los Angeles. To do this, the LA Times has adopted the ‘value-added’ approach (italics mine):
Value-added analysis offers a rigorous approach. In essence, a student’s past performance on tests is used to project his or her future results. The difference between the prediction and the student’s actual performance after a year is the “value” that the teacher added or subtracted.
For example, if a third-grade student ranked in the 60th percentile among all district third-graders, he would be expected to rank similarly in fourth grade. If he fell to the 40th percentile, it would suggest that his teacher had not been very effective, at least for him. If he sprang into the 80th percentile, his teacher would appear to have been highly effective.
Any single student’s performance in a given year could be due to other factors — a child’s attention could suffer during a divorce, for example. But when the performance of dozens of a teacher’s students is averaged — often over several years — the value-added score becomes more reliable, statisticians say.
While I laud the attempt to approach this issue quantitatively, I have serious doubts about their methods (Note to LA Times: methodological issues don’t make an approach “controversial”; they can make it wrong). Let us count the ways:
1) Using percentiles. This makes everything zero sum. Take the example given of a student who moves from the sixtieth percentile to the eightieth. If the students previously ranked 61-80 remain where they are, they all get shifted down one percentile. I think the raw scores should be used instead. Which leads to…
2) Using percentiles, part deux. We really don’t know what percentiles actually mean. How much better in raw score (or some adjusted score if you want to weight questions) is the sixtieth percentile versus the fiftieth. I doubt it’s linear. A decrease from 60th to 50th could be a very small real decrease. Put another way, a supposedly whopping twenty point increase might not be very impressive at all. We simply can’t tell without the underlying scores.
3) Using percentiles, the Lake Wobegon Edition. I’m not a fan of the “ninety percent of our students finish in the top ten percent of their class” philosophy. But the idea that a student who did relatively poorly last year should, if all teachers teach equally well (and they all could be doing a good job), continue to do poorly the next year seems defeatist, even by my standards. Talk about the soft bigotry of low expectations. With a standard curriculum, which the LA Times claims there is, there is an upper boundary effect. If teachers, regardless of where students start, are getting most of their pupils pretty close to where they need to be, then the variation, particularly at the upper end, will largely be wobble and noise, not teacher quality.
4) Brother, can I get an R2 in here? The method described averages the student gains (which contain the aforementioned problems) for each teacher. Allow me to give you the short version of what I think about this approach:
AAARRRRGGGGGHHHH!!!!
The longer version is that we want to capture that variability. Ideally, for each student, there would be associated demographic information (just because students go to the same school doesn’t mean they come from identical backgrounds); if not, we would like to know something about the school the student attends (e.g, is it a ‘poor’ school?). Even without that information, we would like to know how important teacher differences are relative to the total variability (and if possible, other factors too). If other teacher quality account for little of the variation, then, even if teacher effects are significant, that’s probably not what we should be focusing on.
If the data, including the raw scores, are released to the public, this will be a very interesting exercise. As currently constructed, though, I’m concerned the shoddy approach could do more harm than good. Admittedly, I’m biased: if teacher quality were so obvious, we would have found it by now. The link between poverty and educational performance hits you between the eyes like a two-by-four, but teacher effects are pretty weak.
I suggest California just copy Massachusetts’ curriculum and funding levels and see how that works. Or we can embrace untested ideas. Because it’s not like kids matter or anything.