How do we recognize scenes?

ResearchBlogging.orgTake a look at this movie (you'll need a video player like QuickTime or Windows Media Player installed in your browser to see it). You'll see four different outdoor scenes flash by, one at a time. The scene itself will only be displayed for a fraction of a second, followed immediately by a distraction pattern designed to mask any image left over in your visual system. Your job is to spot any desert or mountain scene. Watch carefully!

Did you spot them? What cued you in to the idea of a "desert" or a "mountain" scene? Was it a specific object in the picture (a mesa or a snowfield)? Was it a color? Perception research has historically focused more on the idea of objects or parts of objects (borders, curves) than entire scenes. But is that the way our visual system actually works? What if people are actually taking in the whole scene rather than (or in addition to) focusing in on individual objects?

Michelle Greene and Aude Oliva had 55 viewers rank hundreds of scenes for seven different more general properties: Concealment (C), Transience (Tr), Navigability (N), Temperature (Te), Openness (O), Expansion (E), and Mean Depth (Md). The pictures were presented on a 30-inch color monitor in groups of 100. So if a rater ranked pictures for Navigability, she would drag half the pictures (the least navigable) to the left of the screen, and the other half (the most navigable) to the right. Then these groups were each divided in half two more times, to create a spectrum of eight groupings, from least- to most- navigable. The least navigable pictures might be a dense forest or a steep cliff, while the most navigable might be an open field or a road. Every viewer didn't rate every picture or property, but at least ten viewers rated each picture for each property. Here's how the ratings broke down for four types of scenes:


The boxes correspond to 50 percent of the rankings, so as you can see, for Navigability, nearly all field scenes were ranked high, and most mountain scenes were ranked low. Deserts were ranked high for Temperature, while mountains were ranked low.

Next, a new set of 73 viewers watched hundreds of movies like the four that I showed you above -- only the scenes were flashed for an even shorter time (30 milliseconds, difficult to duplicate online). They saw movies in groups of 50. So, for example, during the first 50 movies they might be asked to identify whether or not a lake scene had flashed by. Then for the next 50 they would identify mountain scenes, and so on. This graph shows how they did:


This graph shows accuracy in rejecting scenes that weren't of the desired category. So, for example, if a viewer was looking for forest scenes, then the typical forest would rank very low on openness. Mountains rank lower on openness than deserts, so their distance to the prototypical forest would be lower than desert scenes. As you can see, accuracy was lower for scenes that are lower in distance to the prototypes: viewers looking for forests made more mistakes when presented with mountain scenes compared to desert scenes. The results in this graph are averaged over all seven properties and all eight different scene types, and the pattern still holds.

But perhaps viewers aren't really classifying the scenes based on these general properties -- couldn't it be true that mountains and forests just tend to have similar objects compared to deserts?

To test this concept, Greene and Oliva developed Bayesian classifiers using a mathematical model. One classifier was trained to classify images based only on the properties of each image as rated by the humans at the start of the study. The other was trained to classify the images based on the physical objects in the scene: trees, water, rock, flowers, and so on. The simulated results of the property-classifier matched the human results nearly exactly, while the object-classifier was much different from the human results. When the property-classifier made a mistake, it was similar to the mistakes the humans made, like mistaking a waterfall for a river. When the object-classifier made an error, it was different from the humans, like mistaking a desert for a field.

Greene and Oliva are careful to say that the properties of a scene may not be the only way we identify scenes, but it does seem clear from these results that properties of a scene a very important part of how we initially identify a scene.

Greene MR, & Oliva A (2009). Recognition of natural scenes from global properties: seeing the forest without representing the trees. Cognitive psychology, 58 (2), 137-76 PMID: 18762289

More like this

We're pretty good at remembering objects in a complex scene. We can even remember those objects when we move to a different location. However, the research so far has found that memory for the original view is a little better than memory when we've moved to a different location. Much of that…
Memory is a curious thing, and visual memory is even more curious. In some ways, we don't remember much about the scene that's right in front of us. As countless change blindness studies have shown, we often don't notice even obvious changes taking place in a scene. Other studies have concluded…
The text below will bring up an animation. Just look at it once -- no cheating! A picture will flash for about a quarter of a second, followed by a color pattern for a quarter second. Then the screen will go blank for about one second, and four objects will appear. Use the poll below to indicate…
Take a look at this quick movie. What you'll see is two sets of three views of the same scene (our living room). For each group of three views, your job is to decide if the third view is taken from the same angle as one of the two previous views. After the first two views flash by, the text "Same…

Curious whether this is influenced by real-life experience. Perhaps who sees hills or mountains every day when they look out their window would find it easier to recognise a mountain than someone who has only seen hills and mountains in books or on the internet.

Using Quicktime Alternative 2.9.0, no video here either. Poking around, it seems the QT Alternative plugin will not be called for files with the extension of .mp4, which is what Dave is using now. Not sure if there is a way to force or add the extension to QT Alternative's settings (as it can indeed open MPEG-4 compressed videos), but I'll look into it...