How to make an effective computerized imitation of a real person

Take a look at this video:

ResearchBlogging.orgYou may have seen it before -- it's the work of a CGI animation studio that takes the motions of human actors and turns them into animated models, giving them the ability to put incredibly realistic figures in impossible situations, like on Mars, or swimming in lava, or whatever an animator can conceive of.

But the advent of realistic simulations such as this makes it clear that people need to be more aware than ever of the potential for digital fraud. We now have email spam, but in the future we might have similar computerized instant messaging spam, or even video messaging spam. What does it take to determine if a video image of a human represents a real person or a computer simulation? It can actually be easier than you might think to create a computer simulation that is convincing to humans.

Consider this simple experiment led by Jeremy Bailenson. Thirteen pairs of students were set up at computer terminals in different rooms, and given only one way to "communicate" with each other: pressing the space bar. Depressing the space bar caused a light on the other user's screen to change color, and releasing it caused it to return to the original color. Meanwhile, a second light on the other user's screen was controlled by one of several possible types of simple computer simulations. So the display looked like this:

i-656781c67369428a7e6e087001684530-Bailenson1.gif

The job of each participant was to convince the other participant that he or she was human, while simultaneously figuring out which flashing dot represented the actions of her/his partner.

Each trial lasted 60 seconds, after which the partners tried to guess which dot was "human" and which was computer-generated. They also rated their confidence in their guess.

The computer used five different possible tricks to fool the humans:

  • Play a pre-recorded script generated by a human
  • Random
  • Play a pattern such as the morse-code "SOS" signal, at gradually increasing/decreasing speeds
  • Mimic the human: play back her/his actions with a four-second delay
  • Alternate between mimicry and a pattern

So how often were the humans correct? Here are the results:

i-eef72c82a84ed9e1b63f4d4c4e1de4c6-bailenson2.gif

For most of the patterns, humans did no better than chance (although the trend was to be accurate). But for the "mimic" condition, humans were fooled more than 60 percent of the time. All it took for the computer to reliably deceive was copying exactly what the humans did.

Next the researchers moved to a virtual environment. This time, a computer-generated character read a persuasive speech while viewers watched wearing a three-dimensional virtual reality headset. The headset detected head movements of the viewer in order to generate a more realistic virtual environment. However, this also allowed the computer-generated character to mimic the viewer's movement in one of three ways (again delayed by four seconds):

  • Mirror: An exact-mirror-image of the viewer's movements
  • Congruent: A mirror-image of right-left movements, but opposite of up/down movements
  • Switch: left-right movements in viewer imitated in up-down movements by computer, and vice-versa

Viewers were asked if they agreed with the speech, as well as being asked several evaluative questions about the computerized speaker: how trustworthy, warm, and informative was he/she (the speaker's gender was matched to the viewer)? They were also asked if they noticed anything unusual about the speaker.

Here are some of the results:

i-9ec86776c212188ce56a46641bacf83d-bailenson3.gif

In nearly all the conditions, if viewers didn't detect the computer was mimicking their own movements, they rated the computer higher, including even their level of agreement with the computer's statements. It's quite clear that a simple way for computerized spam engines to impress humans is to imitate their own actions -- as long as the computer isn't so obvious as to get detected. In this experiment, humans were much more likely to detect the mirror-image imitation by the computer, presumably because they see themselves in the mirror every day.

J BAILENSON, N YEE, K PATEL, A BEALL (2008). Detecting digital chameleons Computers in Human Behavior, 24 (1), 66-87 DOI: 10.1016/j.chb.2007.01.015

More like this

More and more human conversations are taking place online. While I don't do instant messaging the way my kids like to, I'm much more likely to contact a friend via e-mail than to pick up the phone. Here at Cognitive Daily and at other online discussion forums, I've built relationships with…
There is a considerable body of research showing that eye contact is a key component of social interaction. Not only are people more aroused when they are looked at directly, but if you consistently look at the person you speak to, you will have much more social influence over that person than you…
[This article was originally published in April, 2007] There is a considerable body of research showing that eye contact is a key component of social interaction. Not only are people more aroused when they are looked at directly, but if you consistently look at the person you speak to, you will…
SECOND LIFE is an online "virtual world" which enables users to create a customised avatar, or digital persona, with which they can interact with each other. It has become incredibly popular since its launch just over 6 years ago, with millions of "residents" now using it regularly to meet others,…

I'm not really sure I understand the blue graphs. Also, the meaning of switch doesn't seem to be well-explained. Can we see examples of that?

This really looks interesting. I'd like to understand the implications a little better.

By Greg Padilla (not verified) on 09 Oct 2008 #permalink

I'm not really sure I understand the blue graphs. Also, the meaning of switch doesn't seem to be well-explained. Can we see examples of that?

This really looks interesting. I'd like to understand the implications a little better.

By Greg Padilla (not verified) on 09 Oct 2008 #permalink

Hmm. So the first comment was from Greg Padilla, and the second comment was from a computer simulating Greg Padilla? Or maybe it was vice versa, I dunno. This is too sinister.

By Mark Stevens (not verified) on 09 Oct 2008 #permalink

Just a note about the video. Only the face is CG. It is superimposed on a live plate. The technology is still amazing, including the tracking, rendering and animation, but some of the verisimilitude is taken from the realism of the live plate.

You still need an Actor for this to work ie. human involvement. What I'd like to see is the same performance from a completely CG driven character. The "Uncanny Valley" is still too wide to bridge that gap (see Beowulf).

to me they are so distinctly different. and their voices are even more different... i don't understand people honestly confuse them. :) great blog btw. i added a link to you.