How do we recognize scenes?

By dmunger on July 6, 2009.

Take a look at this movie (you'll need a video player like QuickTime or Windows Media Player installed in your browser to see it). You'll see four different outdoor scenes flash by, one at a time. The scene itself will only be displayed for a fraction of a second, followed immediately by a distraction pattern designed to mask any image left over in your visual system. Your job is to spot any desert or mountain scene. Watch carefully!

Did you spot them? What cued you in to the idea of a "desert" or a "mountain" scene? Was it a specific object in the picture (a mesa or a snowfield)? Was it a color? Perception research has historically focused more on the idea of objects or parts of objects (borders, curves) than entire scenes. But is that the way our visual system actually works? What if people are actually taking in the whole scene rather than (or in addition to) focusing in on individual objects?

Michelle Greene and Aude Oliva had 55 viewers rank hundreds of scenes for seven different more general properties: Concealment (C), Transience (Tr), Navigability (N), Temperature (Te), Openness (O), Expansion (E), and Mean Depth (Md). The pictures were presented on a 30-inch color monitor in groups of 100. So if a rater ranked pictures for Navigability, she would drag half the pictures (the least navigable) to the left of the screen, and the other half (the most navigable) to the right. Then these groups were each divided in half two more times, to create a spectrum of eight groupings, from least- to most- navigable. The least navigable pictures might be a dense forest or a steep cliff, while the most navigable might be an open field or a road. Every viewer didn't rate every picture or property, but at least ten viewers rated each picture for each property. Here's how the ratings broke down for four types of scenes:

The boxes correspond to 50 percent of the rankings, so as you can see, for Navigability, nearly all field scenes were ranked high, and most mountain scenes were ranked low. Deserts were ranked high for Temperature, while mountains were ranked low.

Next, a new set of 73 viewers watched hundreds of movies like the four that I showed you above -- only the scenes were flashed for an even shorter time (30 milliseconds, difficult to duplicate online). They saw movies in groups of 50. So, for example, during the first 50 movies they might be asked to identify whether or not a lake scene had flashed by. Then for the next 50 they would identify mountain scenes, and so on. This graph shows how they did:

This graph shows accuracy in rejecting scenes that weren't of the desired category. So, for example, if a viewer was looking for forest scenes, then the typical forest would rank very low on openness. Mountains rank lower on openness than deserts, so their distance to the prototypical forest would be lower than desert scenes. As you can see, accuracy was lower for scenes that are lower in distance to the prototypes: viewers looking for forests made more mistakes when presented with mountain scenes compared to desert scenes. The results in this graph are averaged over all seven properties and all eight different scene types, and the pattern still holds.

But perhaps viewers aren't really classifying the scenes based on these general properties -- couldn't it be true that mountains and forests just tend to have similar objects compared to deserts?

To test this concept, Greene and Oliva developed Bayesian classifiers using a mathematical model. One classifier was trained to classify images based only on the properties of each image as rated by the humans at the start of the study. The other was trained to classify the images based on the physical objects in the scene: trees, water, rock, flowers, and so on. The simulated results of the property-classifier matched the human results nearly exactly, while the object-classifier was much different from the human results. When the property-classifier made a mistake, it was similar to the mistakes the humans made, like mistaking a waterfall for a river. When the object-classifier made an error, it was different from the humans, like mistaking a desert for a field.

Greene and Oliva are careful to say that the properties of a scene may not be the only way we identify scenes, but it does seem clear from these results that properties of a scene a very important part of how we initially identify a scene.

Greene MR, & Oliva A (2009). Recognition of natural scenes from global properties: seeing the forest without representing the trees. Cognitive psychology, 58 (2), 137-76 PMID: 18762289

More like this

How does moving around a scene mess up our memory?

We're pretty good at remembering objects in a complex scene. We can even remember those objects when we move to a different location. However, the research so far has found that memory for the original view is a little better than memory when we've moved to a different location. Much of that…

Why are visual memories so vivid when visual memory is so limited?

Memory is a curious thing, and visual memory is even more curious. In some ways, we don't remember much about the scene that's right in front of us. As countless change blindness studies have shown, we often don't notice even obvious changes taking place in a scene. Other studies have concluded…

What can you remember in a glimpse?

The text below will bring up an animation. Just look at it once -- no cheating! A picture will flash for about a quarter of a second, followed by a color pattern for a quarter second. Then the screen will go blank for about one second, and four objects will appear. Use the poll below to indicate…

How do we remember scenes?

Take a look at this quick movie. What you'll see is two sets of three views of the same scene (our living room). For each group of three views, your job is to decide if the third view is taken from the same angle as one of the two previous views. After the first two views flash by, the text "Same…

No video ... have Windows Media Player 10 and Quicktime Alternative 2.8.0

Curious whether this is influenced by real-life experience. Perhaps who sees hills or mountains every day when they look out their window would find it easier to recognise a mountain than someone who has only seen hills and mountains in books or on the internet.

@Adrian

I downloaded the video, then played it with SMPlayer. ( http://smplayer.sf.net ). I'm not sure why WMP10 does not handle mp4.

Using Quicktime Alternative 2.9.0, no video here either. Poking around, it seems the QT Alternative plugin will not be called for files with the extension of .mp4, which is what Dave is using now. Not sure if there is a way to force or add the extension to QT Alternative's settings (as it can indeed open MPEG-4 compressed videos), but I'll look into it...

Ä± can't open this video. Psikolog can't open.

Ä± use quicktime alternative.

Psikolog DALAN

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Cognitive Daily Closes Shop after a Fantastic Five-Year Run

January 20, 2010

Five years ago today, we made the first post that would eventually make its way onto a blog called Cognitive Daily. We thought we were keeping notes for a book, but in reality we were helping build a network that represented a new way of sharing psychology with the world. Cognitive Daily wasn't the…

Both musicians and non-musicians can perceive bitonality

January 20, 2010

Take a listen to this brief audio clip of "Unforgettable." Aside from the fact that it's a computer-generated MIDI performance, do you hear anything unusual? If you're a non-musician like me, you might not have noticed anything. It sounds basically like the familiar song, even though the…

Synesthesia and the McGurk effect

January 14, 2010

We've discussed synesthesia many times before on Cognitive Daily -- it's the seemingly bizarre phenomenon when one stimulus (e.g. a sight or a sound) is experienced in multiple modalities (e.g. taste, vision, or colors). For example, a person might experience a particular smell whenever a given…

Does watching TV really kill you?

January 12, 2010

Today I had to put off my normal morning run in order to make time to be interviewed on a radio show at 7:30 a.m. As I waited on hold for the interview to start, I could hear the hosts joking back-and-forth about what the "latest TV controversy" is. "Is it the Jay Leno / Conan O'Brien news on NBC…

The outfielder problem: The psychology behind catching fly balls

January 7, 2010

It's football season in America: The NFL playoffs are about to start, and tonight, the elected / computer-ranked top college team will be determined. What better time than now to think about ... baseball! Baseball players, unlike most football players, must solve one of the most complicated…