Understanding cocktail-party conversation: Why do we look where we do?

By dmunger on December 6, 2007.

When we are trying to understand what someone is saying, we rely a lot on the movement of their face. We pay attention to how their faces move, and that informs our understanding of what is said. The classic example of this is the McGurk effect, where the same sound accompanied by different facial movements gets interpreted differently.

Take a look at this short video clip (QuickTime required) of me talking, with my voice muffled by what sounds like cocktail party conversation:

Can you understand what I'm saying? What about after I stop moving? Can you understand me in the second part of that clip? Go ahead and replay the video to see if you can hear it the second time through.

That's right, I said two three-word phrases, not just one. If you're like me, you only heard background noise during the second part of the clip. In fact, I'm curious as to whether anyone can understand me at all. Let's make this one a poll:

I'll play the video with me actually moving at the end of the post, and we'll see if the results change.

Since the McGurk effect, researchers have studied precisely where we look when we watch someone speak, and found that we're not always looking at the mouth. Indeed, we look at speakers' eyes more often. Even more striking, we tend to look disproportionately at the right side of a speaker's face. Why the right side? Several studies have found that the right side of most speakers' faces are more expressive than the left side, so we appear to be focusing on the side of the face that offers the most information.

But what if the left side of a particular face was actually offering more information? Would we switch our focus to that side? A team led by Ian T. Everdell showed 28 college students a series of videos similar to the one I presented above. The students' eye movements were monitored with a tracking device. Speakers uttered one of six phrases, and, as above, sometimes their faces were static and sometimes they were moving. In addition, some of the time the faces were flipped, so what appeared to be the right side of the face was actually the left side in the original.

As expected, viewers could understand the moving faces more often (90 percent of the time) than the static faces (60 percent of the time). Also as expected, for the non-flipped faces, viewers indeed focused more of the time on the right side of the speakers' face. This picture shows the results for one typical viewer:

Some viewers did focus more on the left than the right, but the vast majority of viewers focused on the right side of the speaker's face. So what about when the faces are flipped?

As you can see, there's practically no difference in the results. Whether the faces were presented in the original or mirrored form, nearly all viewers focus on the right side of the face. Whether they focused on the left or right side of the face, viewers were consistent -- left-focusers focused on the left side for both normal and mirrored faces.

Everdell's team argues that if we focus on the left side of the face because there's more information available to help us understand speech, we're not able to adapt very quickly to different speakers. When we're confronted with someone who's more expressive with the left side of their face, we're not able to instantly adapt and focus on that side of their face.

Oh, one last thing. Were you wondering what I said in the second half of that clip? Here's the unaltered original video:

Record your answers below. Let's see if we get a different result now.

Everdell, I.T., Marsh, H., Yurick, M.D., Munhall, K.G., ParÃ©, M. (2007). Gaze behaviour in audiovisual speech perception: Asymmetrical distribution of face-directed fixations. Perception, 36(10), 1535-1545. DOI: 10.1068/p5852

More like this

More insight on how we recognize faces (with cool videos!)

[This post was originally published in November of 2006] Do you recognize the person depicted in this video? (QuickTime required; the movie is below the fold) How about this one? The first video is actually a "chimera," formed by fusing half-images of two well-known faces together, then animated…

More insight on how we recognize faces (with cool videos!)

Do you recognize the person depicted in this video? How about this one? The first video is actually a "chimera," formed by fusing half-images of two well-known faces together, then animated using 3D projection software. The second video shows us just the top half of another famous person's head,…

How our skin helps us to listen

What part of the body do you listen with? The ear is the obvious answer, but it's only part of the story - your skin is also involved. When we listen to someone else speaking, our brain combines the sounds that our ears pick up with the sight of the speaker's lips and face, and subtle changes in…

Synesthesia and the McGurk effect

We've discussed synesthesia many times before on Cognitive Daily -- it's the seemingly bizarre phenomenon when one stimulus (e.g. a sight or a sound) is experienced in multiple modalities (e.g. taste, vision, or colors). For example, a person might experience a particular smell whenever a given…

Am I just hard of hearing? I simply couldn't hear you at all, either time...Not a word.

Me neither -- barely heard him

I did not hear either well enough to understand them, but I managed to catch a syllable or two; just enough to rule out most of the options in the multiple-choice.

I just guessed based on the first syllable I heard vaguely.

Looks like its not just me... there is too much cocktail party. I couldn't hear you say anything.

I couldn't hear your voice at all. The only way I could make out any of the words was to read your lips.

In the first video I heard the word quickly at the end and whaddya know, it was right? Didn't hear the rest of it though. I didn't even hear the first phrase, I guess I'm not a skilled face reader.

I believe people have different ways of understanding people - people like me rely entirely on their hearing in such situations.

It is very difficult even to lipread you. Below a transcript of my inner voice.

Sometimes I feel that we ascribe too much to humans. My domain is finance and you don't want to know how people fly in the face of logic. Yeah, those rational agents.
Let's be careful about the sides here: I'll adopt the first person view. The crux becomes "that discrimination of upright faces in sheep preferentially engages the right temporal cortex, as it does in humans." (Mimmack et al, 2000) The thing with the sheep is to show evolutionary constancy, though conceivably you could transfer the research.
Fact is, right hemisphere steers my left side, also that of m y face. So the few people contemplating s.o.'s LH side of the face are the enlightened ones. The bulk of people looking left merely accesses their right brain hemisphere.
Proposal for research: Pease (2006) makes a point that lies distort facial symmetry towards sb.'s left side of the face (s.a.). Abstracting: When inverting the images, does the recognition of emotions and detection of lies increase since people get to read the richer side of the face while still working the r i g h t brain hemisphere?

You needed to offer an additional option in your polls: "I couldn't hear it at all". If such an option had been present I would have selected it for both polls: I couldn't even tell if you were speaking, much less tell which words you said.

In fact, I'll be completely unsurprised if you come back next week and tell us that you were conducting some experiment on your readers and that actually there were no words whatsoever in the audio, you're actually testing to see which text people choose when there's no reason to choose one over another.

Love your blog - it's highly entertaining as well as informative. It's one of the first things I look at when I open up my RSS reader. Thanks, guys!
- Pauline Wallin, Ph.D.
http://blog.teachmeinternet.com

For all the complaints about the demo, it's interesting to note that it's been one of the most dramatic demonstrations we've ever posted. Only about 20 percent of respondents got it right when they couldn't see my face moving, and about 90 percent responded correctly when they could.

In the actual study, respondents were about 60 percent accurate for the static images, so my demo is quite a bit more difficult, but still it's interesting to see how many people managed to respond correctly when they could see my face moving.

I couldn't understand the second phrase in the first video at all, but otherwise had no trouble making out what you were saying--but for a very particular reason, I suspect. I read through all the choices beforehand. Then it was easy to tell what was being said. The interesting thing is that even now, watching the first video, I effectively don't hear anything at all for the second phrase, even knowing that it's there, even knowing what's being said.

All I hear is the sound of a marketplace. And some faint voices that seem to say "bones play sleep" or something like that.

I'm so relieved to see that most other people couldn't distinguish it! I couldn't either, but I attributed it to my growing problem recognizing speech. My hearing is quite good, but when my wife and I watch a movie at home, I frequently turn to her and ask, "What'd he say?" Indeed, I can still recall watching that nomination speech by candidate Bush in 1988, where he said "mumble mumble mumble, No New Taxes!" and never figuring out what he said. ;-)

You picked phrases where only one could be correct if you can lipread at all -- the first had to be start with an f or v, the last with a p or b, so that left only one option when you put up multiple choice. If you'd had other phrases that started with the same point of articulation, you might have had dramatically different results. I wouldn't have gotten either without the list of phrases -- all I could get was the basic phrase contours, although I did hear both the first time.

Ditto to #15. All I could tell for the first one was that it started with "F", and that there was an "L" sound with a hard sound (b or p) right before it. And I didn't catch the second phrase at all on that one.

What fun! I love these kinds of tests. I heard "quickly" at the end of the first video and was able to choose the right sentence. I think answering was greatly aided by the multiple choice format. I would not have been able to answer correctly for the first video if I didn't have a script to choose from. Now if only the next cocktail party I went to had the same assistance... ;)

I couldn't understand a word you said in the first part of the first video. Needless to say, I didn't understand what you said in the second part of it either. Both of these statements held true even when I put my ear right up to the speaker and didn't watch the video.

For the second video, my answer was based on reading your lips.

I'm guessing none of the tests were done with well-endowed women?

I also could not hear anything in the first clip and would have chosen that as an answer if it were an option. I only got the second one right because I could tell you were making a "Q" sound with your lips; couldn't hear the "Q" sound,though.

Interesting, though: I have often thought that I was below average at understanding what someone is saying in the presence of similar ambient noises.

I was unable to discern what you were saying until I fixed my view on your lips. It seems that the I choose was the most voted one!

I also would have liked an "I have no idea" option, on both halves of both clips. I always have trouble like this at parties, though.

I'm slightly confused about the results. Do they mean that the subjects looked at the actual right-side of the face, or at the perceived right-side of the face? Right now I'm not sure how to interpret the graph and this result.

I think I'd learned to filter out some of the cocktail party noise in the second video, but nevertheless, I relied on lip-reading more than hearing, to determine what you'd said. I've worked with individuals with head injuries and cerebral palsy through a therapeutic horseback riding program, and since I'm almost always a horse-handler, I have the best view of the rider's face when we stop (I'm required to stand directly in front of the horse at that point, facing the rider). Invariably, I can understand what the rider is saying, even if the therapist and other volunteers cannot. After reading this post, I think that my ability to understand a speaker with upper motor neuron deficits has more to do with watching facial movements carefully, than with my hyperacusis (which is actually a detriment in noisy situations).

I'd be very interested to know about related research that might help therapists and caretakers better understand the speech of individuals with upper motor neuron lesions.

Heh, took me a couple listens to understand the first part of the first video. ^_^ I *definitely* couldn't tell what the second part of it was.

I think part of the dramatic difference might be that a lot of listeners simply marked down what they heard in the first section. Even though I had to guess on the first question, it was quite obvious to me that you weren't saying the same thing.

Once you moved with the audio, though, it was easy to hear.

I believe people have different ways of understanding people - people like me rely entirely on their hearing in such situations.

All I hear is the sound of a marketplace. And some faint voices that seem to say "bones play sleep" or something like that

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Cognitive Daily Closes Shop after a Fantastic Five-Year Run

January 20, 2010

Five years ago today, we made the first post that would eventually make its way onto a blog called Cognitive Daily. We thought we were keeping notes for a book, but in reality we were helping build a network that represented a new way of sharing psychology with the world. Cognitive Daily wasn't the…

Both musicians and non-musicians can perceive bitonality

January 20, 2010

Take a listen to this brief audio clip of "Unforgettable." Aside from the fact that it's a computer-generated MIDI performance, do you hear anything unusual? If you're a non-musician like me, you might not have noticed anything. It sounds basically like the familiar song, even though the…

Synesthesia and the McGurk effect

January 14, 2010

Does watching TV really kill you?

January 12, 2010

Today I had to put off my normal morning run in order to make time to be interviewed on a radio show at 7:30 a.m. As I waited on hold for the interview to start, I could hear the hosts joking back-and-forth about what the "latest TV controversy" is. "Is it the Jay Leno / Conan O'Brien news on NBC…

The outfielder problem: The psychology behind catching fly balls

January 7, 2010

It's football season in America: The NFL playoffs are about to start, and tonight, the elected / computer-ranked top college team will be determined. What better time than now to think about ... baseball! Baseball players, unlike most football players, must solve one of the most complicated…

Understanding cocktail-party conversation: Why do we look where we do?

More like this

More insight on how we recognize faces (with cool videos!)

More insight on how we recognize faces (with cool videos!)

How our skin helps us to listen

Synesthesia and the McGurk effect

Cognitive Daily Closes Shop after a Fantastic Five-Year Run

Both musicians and non-musicians can perceive bitonality

Synesthesia and the McGurk effect

Does watching TV really kill you?

The outfielder problem: The psychology behind catching fly balls

Send Your Face to Space

The Grand Canyon: Monument To An Ancient Earth. Great new book.

The London pterosaur invasion, sneak-peek