Does sound need to be perfectly synchronized with what we see?

Take a look at this video of a professional drummer playing the conga:

i-eca0cf2af9fc3ac4445c7dff7d8aab70-research.gifIt's easy to see that the sound coming from the drum is perfectly synchronized with the motion of the drummer's hands. Or is it? When a sound enters your ear, it takes less than 1 millisecond for the signal to be transported from the outer to the inner ear, where it can be perceived by the brain. The equivalent process in the eye takes 50 milliseconds. Then there is the matter of the physical difference in the speed of light versus sound. If the drummer is between 15 and 20 meters away, the faster travel of the light makes up for the 49-millisecond difference in processing speed. Sound produced at a longer distance -- say, 100 meters or more from the viewer -- is noticeably delayed.

But if the drummer is close by -- in this case, the camera was placed just 29 cm from the drum -- then in principle we should perceive the sound of the drum before we perceive the hand hitting the drum. And we do. Roberto Arrighi, David Alais, and David Burr have actually measured the difference between the time viewers perceive a sound and when they see the sound being made. Even more remarkably, they showed how this difference can vary under different circumstances.

The human perceptual system doesn't perceive all inputs at the same speed: for example, simultaneous changes in the motion and color of an object have been shown to be perceived as occurring at different times. Intriguingly, in 2004, C. Aymoz and P. Viviani found that this difference disappeared when the motion depicted was the natural movement of a human. Arrighi's team was excited to extend this research into the realm of hearing as well: that's where the conga video comes into play.

They taped the drummer playing at three different tempos: 1, 2, or four beats per second. Then they played back the video to three viewers with the audio track randomly shifted by up to 300 milliseconds. Viewers were simply asked to indicate whether the audio and video was in sync. As you may have guessed by now, all three viewers made the same systematic error: when the sound was delayed by around 50 milliseconds, they were significantly more likely to say the sound was in sync than when it was actually precisely in sync. This delay corresponds nearly identically to the physical delay in processing speed between the visual system and the auditory system.

But Arrighi's team found more. They also found that as the drummer's tempo increased, the viewer's error decreased:

i-247d7da93085c1ed07ba0090d1ef8e4b-arrighi1.gif

Could either perceptual delay be related to the biological motion phenomenon observed by Aymoz and Viviani? To find out, Arrighi's team developed two types of movies based on the original conga movie. In the first movie, the drummer's hands were replaced by dots, but the natural motions were preserved:

In the second movie, instead of natural biological motion, the "drumming" action was accomplished by strict linear motion.

Was there a difference between the delays perceived with a natural motion and a biological motion? No. The results were essentially the same, with the same decrease in perceived delay as the rate of drumming increased.

So, are we simply more accurate in judging synchronization as the rate of drumming increases? No. In a third experiment, Arrighi's team randomized the rates of drumming and tested even higher drumming rates. Eventually, sounds occurring before the visual stimulus were accepted as syncronized.

Why does this occur? Arrighi's team speculates that it may be related to how the brain responds to repeated stimuli: in monkeys, when a visual stimulus is repeated slowly, their brains continue to respond for about 80 milliseconds after each stimulus. But when the stimulus repeats more frequently, the response time shortens to about 30 milliseconds. Perhaps a similar process is occurring in human brains, making us more accurate when sounds or visual images are repeated faster.

Arrighi, R., Alais, D., & Burr, D. (2006). Perceptual synchrony of audiovisual streams for natural and artificial sequences. Journal of Vision, 6, 260-268.

More like this

There is a behavioral phenomenon called "negative synchronisation error". If we tap (e.g. with a finger) to a given sound from a metronome, it has been shown that our response taps are about 20-50 milliseconds before the metronome's pulse. It seems to be a systematic miscalculation of the time intervals. Possible that if we see a person playing a monotone rhythm we underestimate the acoustic interval and the player seems to be simultaneous in both acoustic and visual if the sound comes before the visual impulse. In my opinion this is more a perceptual issue and not a neural in the first place.