How do we remember scenes?

By dmunger on July 28, 2009.

Take a look at this quick movie. What you'll see is two sets of three views of the same scene (our living room). For each group of three views, your job is to decide if the third view is taken from the same angle as one of the two previous views. After the first two views flash by, the text "Same?" will appear, and that's your cue to decide if the third view is the same as EITHER of the first two. Give it a shot:

Do you think you got the correct answer? Was either of the two sequences more difficult than the other?

The movie was inspired by a recent experiment led by Monica Castelhano. Castelhano's team was interested in how we remember scenes. While there has been a fair amount of work on how we remember objects, the work on scenes is inconclusive. Do we store a holistic, three-dimensional representation of a scene (defined as a coherent, visible part of the world) in our heads, allowing us to consider it from any angle? Or do we remember scenes as separate snapshots, each one processed independently from the next?

For objects, researchers have found that an object (like, say, one of several different toy horses) is more readily recognizable from a different viewpoint if that viewpoint is between two different viewpoints that we've seen before. If you've seen a horse from the front and the side, you're better able to recognize it from a 3/4 view halfway between the front and a side view than from a 3/4 view halfway between the side and the back view. This suggests that we have at least a partial three-dimensional representation of the object in our memory, we're not just recalling the particular views. Does the same principle hold for scenes?

Castelhano's team showed 36 college students movies similar to the ones in my video, but the scenes they saw were created using 3-D rendering software so that the camera angle and lighting could be more carefully controlled. Sometimes the final picture was the same as one of the first two in the sequence, and sometimes it was different in one of two ways -- either it was between the first two views, or outside of the first two views. This diagram shows how it worked:

The first and second camera views were 40Â° apart, and the third view was always 20Â° away from one of the first two views -- sometimes in between (B), and sometimes outside of the previous camera view (A or C). As you did, the students had to say whether the third view was new or old -- whether or not it was the same as one of the previous two views.

Here are the results:

This graph shows accuracy identifying "new" views -- in fact in this case all views are new; the third picture is shifted 20Â° from one or both of the previous views. As you can see, viewers were significantly more accurate identifying the new views when they were outside of the previous views. When view 3 fell between views 1 and 2 (case B), accuracy sank below 50 percent.

The researchers say this demonstrates that viewers aren't simply remembering static pictures and then comparing them to the final picture -- if that was the case, then accuracy would have been similar for all three cases -- and arguably it should have been best for case B, since it was near both view 1 and 2 and therefore easy to compare.

They suggest that at least part of our visual representation of a scene is holistic -- we don't remember individual views, but the layout of the entire scene. Since our memories of the middle part (between the first and second views) are blended together, it's more difficult for us to realize that the third view is actually different.

More on recognizing scenes here, here, and here.

Castelhano, M., Pollatsek, A., & Rayner, K. (2009). Integration of multiple views of scenes Attention, Perception & Psychophysics, 71 (3), 490-502 DOI: 10.3758/APP.71.3.490

More like this

The first was a definite no, the second I think was yes. The second was more difficult because the control scenes were more similar to each other than in the first experiment.

First: no - I just looked at the position of the windows. But in the second set, I see only two photos.

The three photos in the second set were the same as the three photos in the first set, but in a different order. There were no repeat photos within sets.

I recalled an impression of the scenes with reference to the lightest and darkest areas, and it was easy to see that these didn't match up. Neither set was more difficult as such, but the second set threw me because I wasn't expecting it to re-use the photos from the first.

Bollocks!

This experiment is inconclusive. They should have tested recognition of 2D scenes as well. The reason the accuracy was greatest with A is because it's easy to spot the difference between two consecutive images (3D or not). The reason the middle image was hardest was because it had the least actual retinal difference to the average of the other views.

They need to eliminate the probability of plain 2D comparison, especially since the views were shown on a flat screen -- this is not how we perceive the world around us.

The rant in the beginning about objects being easier to recognize if presented from an angle halfway between previously shown views does not, in any way, suggest that we have "at least a partial three-dimensional representation".

Of course you have a greater chance of recognizing an object if its image is a blend of two previously seen images. Then you have two sets of cues to help you remember it. Rather, if anything, it suggests that we remember views as snapshots.

The first set was very easy to determine that the view was NOT the same. The second set was much harder.

I recognised the first set as a "no", because of the difference in lighting.

For the second set, the video ends before the third picture appears.

When I was looking at the first two photos in each set, I was trying to remember clues that would help me determine if the third photo matched. If I didn't know that I was taking part in a memory exercise, I probably would have looked at the first two scenes in a more holistic way.

Thanks for sharing, but the film is broken. that does not work

It would be interesting to know whether the in-between shot B, which is closest to the average of the stored information about the scene, does look more familiar than even the original shots 1 and 2. I think an effect like this was found in the (false) recognition of faces.

A quite-theoretical view on Castelhano et alii study.
It will be useful to reconstruct this hypotesis strarting from Marvin Minsky's 'visual frame-hypotesis' (and then linking it with pattern recognition) and also with a 'classical' hypotesis about physical world geometry recognition by Bertrand Russell 1904 "Philosophy of Geometry".
Any suggestion?
Thanks for patience,
Tommaso

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Cognitive Daily Closes Shop after a Fantastic Five-Year Run

January 20, 2010

Five years ago today, we made the first post that would eventually make its way onto a blog called Cognitive Daily. We thought we were keeping notes for a book, but in reality we were helping build a network that represented a new way of sharing psychology with the world. Cognitive Daily wasn't the…

Both musicians and non-musicians can perceive bitonality

January 20, 2010

Take a listen to this brief audio clip of "Unforgettable." Aside from the fact that it's a computer-generated MIDI performance, do you hear anything unusual? If you're a non-musician like me, you might not have noticed anything. It sounds basically like the familiar song, even though the…

Synesthesia and the McGurk effect

January 14, 2010

We've discussed synesthesia many times before on Cognitive Daily -- it's the seemingly bizarre phenomenon when one stimulus (e.g. a sight or a sound) is experienced in multiple modalities (e.g. taste, vision, or colors). For example, a person might experience a particular smell whenever a given…

Does watching TV really kill you?

January 12, 2010

Today I had to put off my normal morning run in order to make time to be interviewed on a radio show at 7:30 a.m. As I waited on hold for the interview to start, I could hear the hosts joking back-and-forth about what the "latest TV controversy" is. "Is it the Jay Leno / Conan O'Brien news on NBC…

The outfielder problem: The psychology behind catching fly balls

January 7, 2010

It's football season in America: The NFL playoffs are about to start, and tonight, the elected / computer-ranked top college team will be determined. What better time than now to think about ... baseball! Baseball players, unlike most football players, must solve one of the most complicated…