Cognitive Daily

There was plenty of interest in yesterday’s audio-visual illusion. In case you missed it, I’ll post it again here:

Play the movie with the sound turned up. If the illusion works, then you’ll see a dot flash twice, accompanied by two beeps. But actually the dot only flashes once. Unfortunately, we’ve had a hard time getting viewers to see the illusion (as of this writing, just over a quarter of those viewing the video claim to see two or more flashes). I have a couple ideas about why people don’t seem to see it reliably here on Cognitive Daily, and I’m going to investigate them further on Casual Friday. But for those who do see the illusion, I thought I’d offer a little more explanation about how it works.

What we’re talking about here is a cross-modal effect: one perceptual system (in this case, vision) is affected by another (sound). Cross-modal effects are more common than you might think. The visual system, for example, relies on the sensorimotor system to a very large extent. Think about it this way: when you take a video with a camcorder, every little motion of the camera gets translated into a big jitter on your final video. Videos taken from a car window on a bumpy road are practically unwatchable. Yet sitting in the car, you have no problem seeing all the scenery, with no jitters. That’s because your visual system takes into account the sensed motion of your body and adapts.

The visual system shares information with the auditory system as well. Ventriloquists rely on this phenomenon all the time: Since we see the dummy’s mouth moving, we perceive the ventriloquist’s voice as coming from the dummy.

Why would we have evolved to disregard the source of a sound, to perceive it as coming from a different source than it really is? I suspect it’s because in nature, sound can be displaced, via echoes or other phenomena, and so the visual input synchronized with the sound is usually more reliable.

Shams’ illusion, however, is the reverse of the ventriloquism effect: A sound is inducing a visual perception. As Shams et al. point out, the illusion doesn’t work in reverse: if we see three flashes accompanied by one beep, we are never fooled into believing we only saw one flash. In a separate experiment, Shams’ team found that people watching a cello plucking video accompanied by a cello bowing sound often reported hearing the plucking sound. Why didn’t they report seeing bowing instead? When does the visual system take precedence, and when does the audio system?

You might think it relates primarily to context: whichever mode is most appropriate “wins.” But in the case of the Shams illusion, the visual mode is always what observers are paying attention to, and sometimes the audio stimulus affects it and sometimes it doesn’t. More likely, Shams et al. argue, is that different phenomena are related to the particular stimuli themselves: the discontinuous stimulus (beeping, plucking) is more likely to affect the continuous stimulus, compared to the other way around.

Clearly, however, there are limits to this effect. In the context of Cognitive Daily, most viewers don’t see it. As Mick Grierson noted in the comments yesterday, the fluidity of the effect can have interesting implications in the arts. I’d be very interested to see a work of art that explores the boundaries of this illusion: when it works, and when it doesn’t.


  1. #1 Chris Hinkle
    October 24, 2006

    you presentation format may be to blame. I would use Flash instead of quicktime for many reasons.

    Also, there might be a feedback issue, the amount of time the movie plays vs the amount of time it takes to press play factored with the amount of time it takes to change your focus from the play button to the movie etc.

    I’ll be happy to supply you with the asset if you’d like, simply send me an email.

  2. #2 CC
    October 24, 2006

    Your average consumer monitor refreshes every 25 miliseconds, so depending on the duration of the dot image, you may get artifacts resulting from monitor refresh. I assume they controlled for this, and ensured that the stimulus duration was some multiple of 25 miliseconds, but this is not mentioned in the original Nature paper. The problem is that you might get a “half-drawn dot” at the end that could affect your results.

    If this matters, it might explain why you’ve gotten mixed results here (some monitors may refresh slower or faster, and newer ones may refresh at 60hz.)

    But it’s the end of a rather long day for me, so I may be missing something.

  3. #3 Dado
    October 24, 2006

    >> As Mick Grierson noted in the comments yesterday, the fluidity of the effect can have interesting implications in the arts. I’d be very interested to see a work of art that explores the boundaries of this illusion: when it works, and when it doesn’t.

    Cross-modal effects have already been used in old movies. When there was a jump cut and it was impossible to correct, to make it invisible, editors used to add a sudden sound, like a horn for instance if it was in a city. If I remember well, there are many examples in Hitchcock’s movies cause he used to make shots the shortest he could so that the producers wouldn’t modify his movie, and sometimes they didn’t fit well together.

  4. #4 Bongo
    October 25, 2006

    I would speculate that notwithstanding the instructions, most viewers didn’t have their sound turned up.

  5. #5 Raymond
    October 25, 2006

    Bongo, I doubt it. I think it actually doesn’t work for most people…

    Interesting though, I’d have though most people’s monitor’s refresh at 60Hz, this being the default for many monitors. (I’ve deployed 100s at 60Hz). I don’t have QT embedded in my browser so perhaps pressing play and changing my focus also had a part to play. I’d have thought it better to have an mpeg to download and play fullscreen with a bit of delay before the stimulus starts – at least this means everyone has the image in the same part of the screen.

  6. #6 etbnc
    October 27, 2006

    Regarding the technology factor, I worry that this sort of experiment may not work well over the Web. There’s a lot of potential variation in web users’ hardware and software, and I’m not sure how well the experiment can control for that.

    Regarding old movies: Thanks for that inside info. Also, how many of us have noticed the colored dots used to signal theatre projectionists to start the next reel of film? They’re BIG, but I only noticed them after a friend who worked as a projectionist pointed them out to me. Now I can’t ignore them. Doh!

    Projectionist signals may be less frequently used these days with high-tech automated movie projectors, but few people seem to notice them anyway. But then these new-fangled DVDs have that annoying layer-change fr—eeze! It amazes how badly timed those pauses can be, and how often they could be hidden by adjusting their timing by just a few seconds. Sheesh…


  7. #7 Leigh
    October 28, 2006

    I don’t think it’s so much the sound causing the problem as it is the registration of two separate visual events. The dot flashes onto the screen, and then, at least in terms of you the visual system registers it without conscious filtration, a white dot flashes into place cover the black dot. As someone with Asperger’s who doesn’t filter things the normal way, I consciously count two flashes. And as this may be to fast for people without that “syndrome” to process consciously, I think that may be what you are dealing with.

  8. #8 Leigh
    October 28, 2006

    you = how, to=too