Infants match human words to human faces and monkey calls to monkey faces (but not quacks to duck faces)


From a young age, children learn about the sounds that animals make. But even without teaching aides like Old Macdonald's farm, it turns out that very young babies have an intuitive understanding of the noises that humans, and even monkeys, ought to make. Athena Vouloumanos from New York University found that at just five months of age, infants match human speech to human faces and monkey calls to monkey faces. Amazingly, this wasn't a question of experience - the same infants failed to match quacks to duck faces, even though they had more experience with ducks than monkeys.

Voloumanos worked with a dozen five-month-old infants from English- and French-speaking homes. She found that they spent longer looking at human faces when they were paired with spoken words than with monkey or duck calls. They clearly expect human faces, and not animal ones, to produce speech, even when the words in question came from a language - Japanese - that they were unfamiliar with. However, the fact that it was speech was essential; human laughter failed to grab their attention in the same way, and they didn't show any biases towards either human or monkey faces.

More surprisingly, the babies also understood the types of calls that monkeys ought to make. They spent more time staring at monkey faces that were paired with monkey calls, than those paired with human words or with duck quacks.


That's certainly unexpected. These babies had no experience with the sight or sounds of rhesus monkeys but they 'got' that monkey calls most likely come from monkey faces. Similarly, they appreciated that a human face is an unlikely source of a monkey call even though they could hardly have experienced every possible sound that the human mouth can make.

Perhaps they were just lumping all non-human calls and faces into one category? That can't be true, for they would have matched the monkey faces to either monkey or duck calls. Perhaps they matched monkeys to their calls because they ruled out a link to more familiar human or duck sounds? That's unlikely too, for the infants failed to match ducks faces to quacks!

Instead, Vouloumanos believes that babies have an innate ability to predict the types of noises that come from certain faces, and vice versa. Anatomy shapes the sound of a call into a audio signature that's specific to each species. A human vocal tract can't produce the same repertoire of noises as a monkey's and vice versa. Monkeys can produce a wider range of frequencies than humans can, but thanks to innovations in the shape of our mouth and tongue, we're better at subtly altering the sounds we make within our narrower range.

So the very shape of the face can provide clues about the noises likely to emerge from it, and previous studies have found that infants are very sensitive to these cues. This may also explain why they failed to match duck faces with their quacks - their visages as so vastly different to the basic primate design that they might not even be registered as faces, let alone as potential clues about sound.

If that's not enough, Vouloumanos has a second possible explanation - perhaps babies use their knowledge of human sounds to set up a sort of "similarity gradient". Simply put, monkey faces are sort of like human faces but noticeably different, so monkey calls should be sort of like human calls but noticeably different.

Either way, it's clear that very young babies are remarkably sensitive to the sounds of their own species, particularly those of speech. The five month mark seems to be an important turning point, not just for this ability but for many others. By five months, they can already match faces with voices on the basis of age or emotion, but only after that does their ear for voices truly develop, allowing them to tune in to specific voices, or to the distinct sounds of their native language.

Reference: PNAS doi: 10.1073/pnas.0906049106

More on child development:


More like this

Hmm. This may be a very stupid idea, but I wonder if the babies would've reacted differently if the ducks were shown head-on. You mention the possibility that duck faces were not registered as faces - could that have to do with the fact that the picture of a duck shown here doesn't show the basic two eyes, mouth, nose setup? Would the outcome be different if humans were shown in profile, or ducks in front view?

Also, how strong is the face shape/type of noise correlation in birds, given their different vocal apparatus? Maybe it's not even possible to predict the sound a bird makes from its face alone with any precision. (It would be interesting to see how well adults could match the vocalisations of unknown birds to the faces, though obviously, experience with similar/related birds could become a confounding issue...)

I wonder how good babies are at dog or horse or other mammalian faces. Including a variety of other mammals in a similar study could help test the "similarity gradient" idea.

Another question is how much of the duck effect is due to the babies' indifference to ducks as opposed to primates. What if, when they realised that the face was a duck and the noise wasn't interesting either, they just stopped caring?

That would explain the trend on the duck side of Experiment 3: duck faces are not awfully interesting, but human voices are, so the baby will keep looking, maybe expecting a human face to turn up. Less so with monkey calls, and with quacks, the baby might think that there's no chance that there will ever be a human there, so why bother looking?

(Although this hypothesis predicts a difference between human and duck noises when a monkey face is shown in Experiment 2, so I guess it's not that clever after all :))

I'm surprised by how poor babies are at making this kind of discrimination. It's a very small difference between staring at human, monkey and duck faces -- you're talking 20% at most. Maybe it's the paradigm, maybe it would be clearer if it was noises with all three pictures, rather than just look at or away.

Or maybe our innate wiring isn't that good at all.

I wonder how good babies are at dog or horse or other mammalian faces. Including a variety of other mammals in a similar study could help test the "similarity gradient" idea.

Indeed - what I'd be really interested in is whether they can match face-sounds for different primates, say a gorilla compared to a macaque.

I agree that the duck shown in profile throws in a possible doubt. But then, I don't think a head-on duck would show "the basic two eyes, mouth, nose setup" -- remember that ducks do not have forward-facing eyes.

Just random speculation: Has anyone ever had the feeling that a voice just "doesn't fit" with a person's face? Perhaps if there *is* a predictive ability hardwired into our brains, that phenomenon could be linked to it. Are there other studies showing people are able to link voices to specific faces? Baby studies? (I originally thought of adults, but then a baby study linking different voices to different faces would be better, as it avoids cultural information creeping in.)