The six-second teacher evaluation

Family lore has it that my uncle was influential in instituting what is now a fixture in college education: student evaluation of college instructors. He was class president at the University of Washington in the 1960s, when tensions between students and the school administrators were high, and he suggested implementing one of the first student course evaluation systems in the nation as a way to address the problem. Needless to say, the idea caught on.

While college faculty complain unceasingly about the fairness of the now nearly universal student course evaluation system (I did it myself, back when I taught college courses), it has in general been shown to be a relatively reliable indicator of teacher effectiveness, correlating positively with other measures such as faculty and administrator evaluation, as well as actual student learning.

From the teacher's perspective, however, the students can't possibly have enough information to make an effective evaluation of their teaching. A college course represents just a tiny sliver of the total knowledge in a discipline, and even after a semester in a college course, students are in no position to make judgements that will impact a faculty member's entire career.

A 1993 study by Nalini Ambady and Robert Rosenthal found just the opposite: students actually need much less information to make judgements that accurately predict end-of-semester evaluations.

Ambady and Rosenthal extracted 3, 10-second video clips of 13 teachers from tapes of entire class sessions. These 39 clips were randomized and presented without sound to 9 female college students, who rated them on a scale from 1 to 9 for a variety of behaviors, including "attentive," "confident," and "supportive." The ratings were highly consistent between judges, with a global reliability measure of .85 overall.

These teachers were rated by their own students at the end of the term on a general effectiveness scale. I've created a table below to show the correlation of the ratings of the 30 seconds' worth of clips with the end-of-semester rating:

Accepting .50
Active .77
Attentive .48
Competent .56
Confident .82
Dominant .79
Empathetic .45
Enthusiastic .76
Honest .32
Likable .73
(Not) anxious .26
Optimistic .84
Professional .53
Supportive .55
Warm .67

The significant correlations -- 9 of the 15 measures -- are in boldface type. Concerned that their measure may only reflect a cursory evaluation of the physical attractiveness of the teachers, Ambady and Rosenthal had separate judges rate the teachers for attractiveness based on still photos. Even after controlling for physical attractiveness, the correlation between student ratings and the video clip ratings was still significant. Apparently after seeing just 30 seconds of nonverbal behavior, we can reliably predict teaching ability.

Not satisfied with comparing results only to student evaluations, Ambady and Rosenthal repeated the experiment with videotapes of high school teachers, and compared them to effectiveness ratings provided by the school principal. The results were comparable.

So how thin a slice of behavior is needed to accurately predict teaching ability? The researchers had an assistant unfamiliar with the task randomly select 5- and 2-second clips from the original 10-second clips. They repeated the rating task with a new group of female college students. The ratings for these shorter clips were less reliably correlated with the teacher effectiveness ratings, but amazingly, the 2-second ratings for college teachers were still significantly correlated with overall end-of-semester effectiveness ratings. Though these were the only short clips that were significantly correlated with effectiveness, neither the 5-second or 2-second ratings were significantly different from the 10-second ratings. What's more, if the short ratings for college and high school teachers are combined, they do significantly predict effectiveness ratings.

So we do appear to be quite effective at making judgements about teaching ability even after viewing only a total of 6 seconds of actual teaching, and without even hearing the teacher's voice.

So what does this suggest about my uncle's system of student teacher evaluations? Not much, directly, but it does suggest that students who choose courses by visiting lots different classes during the first week may not be any less rational than those who pore over the student-produced faculty guidebook.

Ambady, N., & Rosenthal, R. (1993). Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology, 64(3), 431-441.


More like this

Larry Moran thinks I have the wrong idea about teaching evaluations and "thin slicing": Unfortunately, Dave Munger seems to draw the wrong conclusions from this study as he explains in an earlier posting [The six-second teacher evaluation]. In that article from last May he says ... So we do appear…
Noam Scheiber points to working paper, SOCIAL DESIRABILITY BIAS IN ESTIMATED SUPPORT FOR A BLACK PRESIDENTIAL CANDIDATE, which attempts to figure out the Bradley Effect by guaging avowed vs. implied support. Mark Blumenthal of Myster Pollster has an interview where one of the authors explains the…
I've spent the morning looking around the Web to bring you today's news snippets, but then I came back to ScienceBlogs and realized that the best posts on cognitive science are being made right here. Jonah Lehrer has an excellent analysis of Malcolm Gladwell's Blink. Having just finished the book…
Suppose your organization is interviewing candidates for an important job. Would it be better for one trusted person to have an extended interview with them, or for several people to talk to them for less time? How many people would you need to conduct the interviews? Would three be enough? Would…

Does this translate into the "first impression"? Is it important to start the very first class of the semester in a particular way and then just try to sorta keep it up for the rest of the time without making significant gaffes?

Yes, I think the first impression is incredibly important. I'm not sure whether that's actually what's going on here -- I suspect it's more likely the case that people who give good first impressions are very likely to follow through on that for the entire semester. The point is that very small bits of our behavior are incredibly revealing to others.

I spent nearly 20 years teaching IT in a corporate setting and now run instructor training classes. These results are fascinating - thanks, but they don't surprise me. Anyone who has taught in industry is used to being evaluated by your students at the end of what may be just a half or one day course. You get to understand that this is a good assessment of your ability to please your students.

However, it doesn't follow that the course is effective. What matters more is what the students do during the class. That's when the learning takes place, and that is more to do with content and planning than instructional skills. I try to get across the message that you don't have to be a great instructor to give a great class and conversely a great instructor can give a poor class).

By Mark Frank (not verified) on 01 May 2006 #permalink

We use student evals as part of our annual performance/self-evaluation review. We're a small school, so word travels fast if a new teacher is bombing. The evals are merely documentation. The administration uses the student feedback only as part of the overall picture. Personally I like the feedback, since I understand the students' opinions ultimately do not usually endanger my further employment.

In Physics, there is evidnece that student evaluations do not correlate very well at all with how much students are learning.

How is student learning evaluated in the studies that say that student learning correlates well with student evaluations?

Also all those characteristics that were judged from the silent clips-- they all say something about the teacher's stage presence. None of them really seem to say anything about how much the students are learning. How relevant is this then, really? Are we more concerned about the performance than what the class is really supposed to be for?


Very interesting, but I agree with the previous comments that the behavioral features that show the highest correlation between 30-second and whole-course evaluations are related to teacher self-confidence and image more than to objective teaching effectiveness (not that the former are unimportant for the latter - see below - but just that they are distinct features).

It is of course not surprising that we pick up other people's self-confidence and assertiveness levels very rapidly - we have strong unconscious, instinctive mechanisms that were honed precisely to deliver and gauge human signals related to dominance and social status.

It is quite possible therefore that subjective teaching evaluations from students may in large part be related to the same features - we simply think a self-confident teacher is more effective than a less confident one, regardless of whether we actually learn more or better from one or the other.

If that's the case, the surprise is not that 30 second evaluations are as good as whole course evaluations, but that they both are ultimately so superficial. It is possible that we can fool ourselves to think a teacher is better than he/she actually is, simply based on their projected attitude? (Of course, there is the additional complication that a large part of teaching is to encourage learning by communicating enthusiasm for the subject, which is easier for an assertive person to do convincingly.)

The real test would be to see if whether there is a correlation between 30-second evaluations and some objective metrics, such as standardized test results.

By Andrea Bottaro (not verified) on 01 May 2006 #permalink

As someone who has been on both ends of the evaluation schemes [as the publisher and editor of the SLATE Supplement at Berkeley and 12 years of college teaching] I can concur that there are random short stretches of teaching that influence the evaluation of a teacher.

What I tried to get across to students was that what they learned in class [and how well I entertained them] was not the key of their education, but it was the way I tried to show how to approach learning about a subject, any subject. what they got from me was 80% useless info and 20% crucial. The problem was that neither they nor I could tell at that time which 20% was going to be important.

By natural cynic (not verified) on 01 May 2006 #permalink

You mentioned "it has in general been shown to be a relatively reliable indicator of teacher effectiveness, correlating positively with other measures such as faculty and administrator evaluation, as well as actual student learning"

My understanding from a special issue of American Psychologist several years ago was that student ratings were highly correlated with expected grades; so highly correlated, in fact, that use is dubious. Is the above suggesting that student ratings are correlated with INDEPENDENT measures of student learning or is it simply correlated with end-of-term exam scores. Perhaps there is some recent study I've not seen...

By Eric Durbrow (not verified) on 01 May 2006 #permalink

Malcolm Gladwell's book "Blink" explores phenomena like this in some detail. It's written for a very general audience, but a good read nonetheless.

Have you heard of the study that can predict the success of a marriage in less than a second? The key to that study is that the observer can't watch just any second of footage and guess a couple's future, they would have to watch the perticular parts that contain the revealing behaviours.

I think this is relevant to this study. The article did not specify which 10 seconds were extracted. It could have been 10 seconds in the beginning, middle or end of a session. What I'm guessing was important about those 10 seconds is that they demonstrated those revealing behaviours.

My point? I think that the impression of an instructor hinges on more than first impression. I think it is important for the first impression to be intentional, but I do not see evidence to support that the first 10 minutes will put the impression of the facilitator into stone. Think about how we create rapport with our participants. Many times it comes later in the program and that's okay.

One more point I wanted to mention is related to the fact that they showed the clips without sound. That may actually make it easier for people to assess a person's performance since we unconsciously take signals from body language over what we hear (actions speak louder than words). By eliminating the sound, you are allowing them to focus on the indicators that more significantly impact our perception of performance.

This was a very thought-provoking article. I hope that people embrace the general nature of the results and avoid the quick-fix mentality (you know how we are drawn to quick-fixes in this industry!).

Instructional designers evaluate learning using 4 or 5 different levels of measurement.

1 - The smile sheet (student evaluations)
2 - Learning (how well they did on the test)
3 - Transfer (applying knowledge on the job)
4 - Results (new behaviors solved the original skill gap)
5 - ROI (improved performance benefitted the organization)

So, you can see that student evaluations are both the lowest level and the foundation of learning. After all, liking the class isn't going to help you with your job if the instruction isn't valuable. But if you hate the class, it's unlikely that you will take anything away with you.

Additionally, personal presentation has a lot to do with how much students absorb. Two facilitators covering exactly the same materials can produce wildly different results in their students. One may be more attuned to their students and seem more open to questions, while another - who may be just knowledgeable and helpful - may have a slightly brusque manner that shuts his or her students down.

So, while the smile sheet is not the be-all-and-end-all of evaluating the success of training, it's not to be dismissed lightly either.

Let me take an evolutionary psychology approach--it looks to me like human beings are well-adapted to make rapid and accurate judgements about other human beings and their characters, a skill we might expect them to have evolved. The question is how this relates to teaching. Have we evolved a similar capacity to eyeball teaching effectiveness? This seems highly unlikely from an evolutionary psych perspective.

What seems much more likely is that people who are confident, assertive, competent, enthusiastic, warm, etc. make good teachers--and not necessarily for their big brains. They may not have as rich a command of the depth and breadth of their fields as other professors who get lower evaluations, but their personal qualities make students come to class, pay attention and work for approval. This correlates with increased evaluation scores for personal attractiveness, as well. The terrible truth is that attractive instructors may actually be more effective teachers, precisely because they make students want to come to class, pay attention, and work for instructor approval. I have long been aware that my high teaching evaluations come almost entirely from my enthusiastic (near manic) energy levels in the classroom, rather than from any particular command of the subject or great pedagogical technique. Entertainment may seem like the most superficial and shallow of techniques, but it may perversely generate the right results.

Finally, on a couple of occasions in my career I've had to begin teaching in a new field with little preparation (such as moving from literature to film studies)--perversely, my evaluations were higher in those courses. My guess is that, percisely because I didn't know the material very well, I was very interested in it, and my fascination was contagious.

By Rob Rushing (not verified) on 07 Dec 2006 #permalink

I wouldn't say that administrative evaluations are any better of a judge of teaching ability. Administrators probably know less that students about what happens in a classroom and they're just as biased regarding personal preferences for teachers who are attractive, smiling, etc. I think that's why it's so important that we look at ACTUAL learning to see if these evaluations are correct. I really appreciate Rushing's comment noting that we do learn more from attractive, enthusiastic people regardless of whether they are "good" teachers or whether they know their material.