Take a look at this quick movie. What you’ll see is two sets of three views of the same scene (our living room). For each group of three views, your job is to decide if the third view is taken from the same angle as one of the two previous views. After the first two views flash by, the text “Same?” will appear, and that’s your cue to decide if the third view is the same as EITHER of the first two. Give it a shot:
Do you think you got the correct answer? Was either of the two sequences more difficult than the other?
The movie was inspired by a recent experiment led by Monica Castelhano. Castelhano’s team was interested in how we remember scenes. While there has been a fair amount of work on how we remember objects, the work on scenes is inconclusive. Do we store a holistic, three-dimensional representation of a scene (defined as a coherent, visible part of the world) in our heads, allowing us to consider it from any angle? Or do we remember scenes as separate snapshots, each one processed independently from the next?
For objects, researchers have found that an object (like, say, one of several different toy horses) is more readily recognizable from a different viewpoint if that viewpoint is between two different viewpoints that we’ve seen before. If you’ve seen a horse from the front and the side, you’re better able to recognize it from a 3/4 view halfway between the front and a side view than from a 3/4 view halfway between the side and the back view. This suggests that we have at least a partial three-dimensional representation of the object in our memory, we’re not just recalling the particular views. Does the same principle hold for scenes?
Castelhano’s team showed 36 college students movies similar to the ones in my video, but the scenes they saw were created using 3-D rendering software so that the camera angle and lighting could be more carefully controlled. Sometimes the final picture was the same as one of the first two in the sequence, and sometimes it was different in one of two ways — either it was between the first two views, or outside of the first two views. This diagram shows how it worked:

The first and second camera views were 40° apart, and the third view was always 20° away from one of the first two views — sometimes in between (B), and sometimes outside of the previous camera view (A or C). As you did, the students had to say whether the third view was new or old — whether or not it was the same as one of the previous two views.
Here are the results:

This graph shows accuracy identifying “new” views — in fact in this case all views are new; the third picture is shifted 20° from one or both of the previous views. As you can see, viewers were significantly more accurate identifying the new views when they were outside of the previous views. When view 3 fell between views 1 and 2 (case B), accuracy sank below 50 percent.
The researchers say this demonstrates that viewers aren’t simply remembering static pictures and then comparing them to the final picture — if that was the case, then accuracy would have been similar for all three cases — and arguably it should have been best for case B, since it was near both view 1 and 2 and therefore easy to compare.
They suggest that at least part of our visual representation of a scene is holistic — we don’t remember individual views, but the layout of the entire scene. Since our memories of the middle part (between the first and second views) are blended together, it’s more difficult for us to realize that the third view is actually different.
More on recognizing scenes here, here, and here.
Castelhano, M., Pollatsek, A., & Rayner, K. (2009). Integration of multiple views of scenes Attention, Perception & Psychophysics, 71 (3), 490-502 DOI: 10.3758/APP.71.3.490