How does moving around a scene mess up our memory?

ResearchBlogging.orgWe're pretty good at remembering objects in a complex scene. We can even remember those objects when we move to a different location. However, the research so far has found that memory for the original view is a little better than memory when we've moved to a different location. Much of that research, however, has focused on relatively complex movements: Viewers are asked to remember an array of objects viewed from one side of a room, then are transported to a different part of the room and asked to decide whether the objects are arranged in the same pattern (actually, they're sitting at a computer watching static images, but the camera has moved).

This sort of motion actually encompasses two separate motions: translation and rotation. For example, in this crude figure representing a room, a viewer moving from the bottom to the right-hand side of the room would have to not only walk to a different part of the room, but also rotate his body (or at least his head) to see the objects in the room.

i-4c9e6b50c395c485fa82b83413d17b6d-waller1.gif

So what about simpler forms of motion? Do translating and rotating alone affect memory for the arrangement of objects in a scene? Rotating is difficult to assess independently, since if you rotate your body too much, you're no longer viewing the scene. But David Waller was able to consider two different types of translation in scenes: backwards and forwards, and side-to-side.

First, Waller trained viewers to recall the arrangement of five toys in a photograph. Then the photograph was systematically altered by either moving the camera towards or away from the toys, or side-to-side:

i-06848bd2f055f7a0020703a89b77c73a-waller4.jpg

In every case, all the toys were still visible, they were just closer to the viewer or farther away, or skewed to one side or another. He created similar sets of photos for 10 different arrangements of the toys. Viewers were then shown all the photos and asked which ones were in the correct, original pattern. Here are the results:

i-3f4d6fddea758de5229b00c0be2bf179-waller2.gif

The graph shows accuracy when the scene matched the scene viewers had been trained on. Even for translations alone with no rotation, viewers were significantly more accurate when there was no shift in perspective. But notice that there's one exception: When the viewpoint moved back from the trained viewpoint, there was no drop-off in accuracy. Waller believes this result may be due to boundary extension: We misremember the boundaries of a scene as wider than they really are, so when the camera moves back, we think we're seeing the same scene. There were similar results for reaction times.

But maybe the results for side-to-side displacement are due to something else: maybe it's just harder for us to remember the locations of objects that aren't centered in a scene. So in another experiment, Waller trained some viewers on centered scenes, and other viewers on scenes where the objects were clustered to the left or to the right. So there are three possible positions for the objects -- center, right, or left of the screen. But if you were trained on objects clustered to the left or the right, then during testing, you would see objects either at the center of the screen or on the opposite side of the screen from where you were trained. Here are those results:

i-eb5d7cf395e7a7020e14b64dbdc1047e-waller3.gif

The accuracy rates are lower than for the previous experiment because these images were flashed only briefly -- just 250 milliseconds -- but when viewers learned the centered scene, the results matched previous results. Accuracy was higher for the scene they had learned than for displaced scenes. Surprisingly, however, the results were different when viewers learned the scenes that had been displaced to one side: They were just as accurate identifying scenes that were displaced relative to training as they were on the scenes they had actually learned. Only when the scenes were shifted to the opposite side of the screen did accuracy decrease significantly. This suggests that there is a bias towards recalling parts of a scene that are closest to the center.

This isn't just due to a tendency to focus on the center of the screen. Before each scene was showed to viewers, a cross appeared to direct their attention to the appropriate part of the computer screen.

Waller says this suggests that our memory for scenes may not include a true three-dimensional representation of the objects in the scene. Instead, we may simply recall "snapshots" taken from the perspective at which we were trained. When items are centered in the snapshot, or in another snapshot we're asked to compare, it's easier for us to recall whether they are the same. If we had a true three-dimensional representation of the scene, it's difficult to see how Waller would find the results he did.

Waller, D. (2006). Egocentric and nonegocentric coding in memory for spatial layout: Evidence from scene recognition. Memory & Cognition, 34(3), 491-504.

More like this

I somewhat doubt his explanation of the distance effect. As any photographer knows, moving closer or farther back does _not_ simply make the scene larger or smaller, but actually shifts the relative position and size of objects in the scene (this is why zooming in with a tele lens is very different from walking in close with a wide lens). In addition, the relative position of objects does not shift linearly with distance - moving about at close distances shifts relative object positions much more than doing so at far distances. I would expect a rather similar-looking graph if this was the reason.

It doesn't look to me like the camera was zoomed in or out. The focal length was kept constant and the camera was moved, as if the observer had moved towards and away from the scene.

Dave, exactly. As I wrote, this changes not only the extent of the scene; it materially changes the visible internal configuration of the objects, and that can by itself explain the depth graph.

Janne,

It's a very small change, and what viewers are looking for is a much larger change -- e.g. two objects being switched in position. People understand that the relative positions of objects change on a two dimensional plane (e.g. their retinal image or a photo) as their perspectives change. The question is whether, despite this, they can detect actual 3-D changes in the position of the objects in a scene.

Note that even when the camera moves side-to-side, the relative position of the objects also changes subtly. And, of course, it also does in the more dramatic combined rotation/translation motion of the earlier studies.

Look at the closest two pictures, and the changes are no less subtle than the side-to-side changes - the airplane (I think) on the left is completely separate in the closest picture, and overlaps with other objects in the next closest one; relative sizes between the close and the farther objects shift quite a bit and so on. The differences are much more subtle, almost nonexistent, in the farther images, which of course fit the data.

I see your point -- it would be interesting to compare a version of this study that simply cropped the photos instead of literally using different camera angles. It could resolve the issue of whether boundary extension was causing the result for the far images.