How the brain interprets complex visual scenes is an enduring mystery for researchers. This process occurs extremely rapidly – the “meaning” of a scene is interpreted within 1/20th of a second, and, even though the information processed by the brain may be incomplete, the interpretation is usually correct.
Occasionally, however, visual stimuli are open to interpretation. This is the case with ambiguous figures – images which can be interpreted in more than one way. When an ambiguous image is viewed, a single image impinges upon the retina, but higher order processing in the visual cortex leads to a number of different interpretations of that image.
Only one of these interpretations is available to our conscious awareness at any one time. Repeated viewing of the image leads to perceptual reversal, whereby first one, and then the other, interpretation is perceived. For psychologists and neuroscientists, ambiguous figures provide a means by which the functioning of the human visual system can be investigated.
Salvador Dali’s 1940 painting Slave Market with the Disappearing Bust of Voltaire (top) is an example of an ambiguous figure. In this painting, the two nuns just left of centre can also be perceived as the bust of the French writer and philosopher Voltaire. When looking at the painting, our perception of the painting switches from one interpretation to the other.
In a study published in 2002, Lizann Bonnar, then at the University of Glasgow, and her colleagues, investigated the stimuli which drive perception of the visual scene depicted in Dali’s painting. Participants were presented with a cropped greyscale version of the painting, consisting solely of the area containing the nuns. A “bubble” filter was used to enhance or obscure certain features of that part of the painting. They found that the participants reported seeing the bust of Voltaire when the finer details of the painting were obscured, and reported seeing the nuns when large scale features were obscured.
This experiment showed the importance of scale information in perception. The researchers specifically manipulated the spatial resolution of the painting (that is, the periodicity with which image intensity changes). Large scale features change little over a given distance, and therefore have a low spatial resolution, while fine-grained features change much more over the same distance, and so have a high spatial resolution.
In a second experiment, the participants were shown random noise patterns before the cropped greyscale painting. One group was shown a pattern with a high spatial resolution, the other a pattern with a low spatial resolution. Afterwards, the former reported seeing the bust of Voltaire, while the latter reported seeing the nuns. This showed that previous experience is an important factor in perception. The participants had selectively perceived the frequency channels presented to them before they viewed the image.
Aude Oliva, head of the Computational Visual Cognition Laboratory at the Massachusettes Institute of Technology, has been using a similar approach to gain a better understanding of the processing of information in the visual cortex.
For more than 10 years, Oliva and her colleagues have been creating and using hybrid images that consist of two superimposed images, both of which have been altered with specialized filtering software.
Using these filters, sharp facial features, such as wrinkles and other blemishes, are removed from one image, and coarse features, such as the shape of the mouth or nose, are removed from the other. The two images are then superimposed; because features with a high spatial frequency are visible only from up close, and those with low spatial frequencies are only visible from further away, superimposition of the two produces a single image whose perception changes as a function of viewing distance.
Thus, the hybrid is a single image with two stable percepts; at a given distance, only one of the images is visible, and it is this image that dominates processing in the visual system; the other image is perceived as something lacking internal organization (noise).
Above is an example of the hybrid images created by Oliva’s group. From up close, the image is perceived as Albert Einstein, because only the sharp features are visible; but if you step a few metres away from the monitor, the blurred features become visible, and the image of Marilyn Monroe emerges.
Oliva’s group has been using this and similar images to investigate the role of different frequency channels for image recognition, and the time course over which this process occurs. What they have found is that when participants are shown hybrid images for durations of 30 milliseconds, they only recognized the low spatial resolution component of the image; when the images were displayed for 150 milliseconds, they only recognized the high spatial resolution component; In both cases, the participants were oblivious to the other interpretation of the image.
Participants were also shown hybrid images consisting of sad and angry faces (high and low spatial resolution, respectively) of superimposed male and female faces. When the images were displayed for 50 milliseconds, and the participants were asked to determine the emotion of the face they had seen, they always reported seeing an angry face; but when asked to determine the sex of the person in the image, they reported seeing a male as often as they reported seeing a female, although the two faces had different spatial resolutions.
Thus, selection of frequency bands during fast image recognition appears to be flexible – in some cases, the brain picks out characteristics with a low spatial resolution, while in others, it discriminates those with a high resolution. It seems that the brain is adept at selecting the frequncy band containing the most information relevant to a particular task. Again, the participants were unaware that the images they viewed contained information in the other frequency range.
The work carried out by Oliva’s group shows that the brain extracts large-scale features slightly earlier than fine-grained features. Large scale features are processed within 50 milliseconds, giving an overall impression of the visual scene. The processing of fine-grained details begins slightly later, at around 100 milliseconds. The fine- and coarse-grained features are extracted separately, and processed in parallel through different channels, in successively higher order areas of the visual cortex. In a process called perceptual grouping, the information from the channels is then seamlessly recombined at visual cortical areas of the highest order to produce a coherent, and usually unambiguous, image.