Features and Attention Are Coded in V1

Vision is the process by which the brain converts the light stimuli into a mental world filled with abstract visual objects. If you stop to think about it, this is an incredible feat. There is nothing in the photons coming from two neighboring sections of an object that implies that they should go together; rather the brain parses this information and forms it together into objects.

Scientists thought they had a good model for how this happens, but Roelfsema et al. show in an excellent recording experiment in monkeys how that model is flawed.

Background

One of the important parts of vision is the process of feature selection. Faced with a shower of photons, the brain groups the stimuli together in groups by their similar properties. For example, looking at the book on my bookshelf, I notice that all the light rays coming from that book are blue. Likewise, there are coming from a similar location in space. The edges of this blob of blue forms a discrete shape. The brain uses all of these features to construct a mental world where that book is a a solid object.

Another important aspect of vision as a process is attention. Certain objects are more important than others. When I am playing tennis, the tennis ball is a pretty important object whereas the fence behind it is not. The visual system must also select and weigh the importance of objects in the framework of what is important for you at that moment.

Now, there was a traditional view of how vision was implemented neurologically. Information in the form of light was received by the retina and sent to the visual cortex in the brain. (For more information about this, read this post.)

When it arrives in the cortex, the first place that this information stops off is a part of the occipital lobe called V1 (for visual area 1). From their the standard model asserted that information was divided up according to type -- color information, shape information, location information, etc. -- and shipped off to other higher visual areas to be processed in parallel. This hierarchy is illustrated schematically in this figure from Merigan 1999 (Figure 1):

i-1c07ccf3d6795963688013cf5a089a50-visualmap.jpg

The top part of the figure shows a schematic of the connections between different higher order visual areas. The bottom is a schematic of the course the information takes through the brain. For example, the standard model asserts that, in general, information about place takes a dorsal course (up on the diagram) through the brain while information about form and color takes a more ventral course (down in the diagram). Also, due to evidence from lesions, attentional processing was assumed to take a more ventral course -- terminating in that diagram in a region called TE. This model was supported by lesion studies and recording in the macaque and other mammals.

The details of this model are not significant, but there are two important aspects to understand:

  • 1) The standard model is hierarchical and unidirectional. In essence, the function of V1 in the standard model is to divide and transport unprocessed visual information to higher areas. It does not process features or attention on its own.
  • 2) The standard model does not answer what is called the "binding problem." The binding problem is how you get different features associated with a single visual object back together once they are separated in the brain. You have to ask how -- and more importantly where -- information comprising the object as a unity is brought back together.

The standard model is not without its problems, however. The first and biggest problem is an experimental observation: the activity in V1 changes depending on whether an object is being attended to or not. The activity -- measured typically in terms of the firing rate -- associated with an object increases when you attend to it. This is an interesting finding because it suggests that visual processing is not unidirectional. Information from the top about what is important modifies the activity at the bottom.

There is one more important concept to understand before we talk about the paper, and that is the concept of a receptive field. A receptive field is a stimulus in the visual world to which a neuron in the brain responds best. In general, we speak of receptive fields as regions in space -- although for higher order processing they can also correspond to particular features in particular regions of space. For example, if a neuron is responds best to a flash of light in the upper right corner of the visual field, then we say it has a receptive field in the upper right corner This is important because changes in activity when an object is in the receptive field is how we know that an neuron is associated with an object. When that object is in that neurons receptive field and it changes the activity of that neuron, we assume that the activity is associated with that object.

Our Interesting Paper

This brings us to our interesting paper, Roelfsema et al.. Roelfsema et al. performed neuronal activity recordings in V1 of macaques. They use multi-unit recording, which means that they could record the activity of many neurons throughout this region simultaneously.

The task that Roelfsema et al. use in their recording of V1 is very interesting. The monkeys are trained to fixate on a point in the center of random dots on a video screen. Then, for a brief instant the random dots on the screen would move, and the edges of two objects would appear on the background of moving dots. The edges of the objects appear because the dots for the objects are moving in the opposite direction to those in the background. Also, within the two objects, there would be two red circles.

The monkey's job during this task was to identify which of the red circles was in the same object of moving dots as the fixation point in the middle of the screen. This is depicted below (Figure 1 in the paper). First, the animal fixates. Then the background dots move delineating the edges of the objects. Then, the monkey has to move to fixate on the red circle that was in the same set of edges that contained the original fixation point. The animal is only rewarded if it chooses correctly.

i-b75f36c0fc50058b58fa3cb75ab9121b-task.jpg

Now that I have utterly butchered that explanation, I have a video that the authors give as example. (Movie 1 in the Supplementary info. I apologize that the quality on this video isn't great. To look at the original video, click here.)

Data

While the monkey was performing this task, the authors recorded from neurons in their visual cortex. They found several interesting things:

  • 1) If you look at the activity of neurons whose receptive fields were inside edges of the objects for some trials, their activity was higher when they were in the edges objects than when they were in the background. Basically, when you compare the activity during trials when the receptive field was over the background and trials when the receptive field was part of the object blobs, the activity was higher during the object trials. This suggests that activity in V1 carries information about whether a part of the visual world is part of an object, i.e. V1 carries information about features.
  • 2) If you look at the activity of neurons whose receptive fields were on one of the red circles, the activity was higher when that circle was the target that was be reward. Basically, when you compare activity between trials for neurons with receptive fields on the red circles, the neurons are more active when the red circle was rewarded than when the red circle was just a distractor. This suggests that activity in V1 carries information about attention, i.e. whether the animal is paying attention to that object.
  • 3) The changes in activity -- firing rate -- have a specific time course. For example, the increase in firing rate associated with the receptive field being part of the edged object -- i.e. feature selection -- occurred about 10 ms after the object was first recognized by V1. Likewise, the increase in firing rate associated with the receptive field being attended to occurred about 80 ms after that.

The authors draw up a schematic to represent these changes in firing rate over time.

i-6b97be2a49930efece0ff8fd84b17292-schematic.jpg

If the activity associated with the neurons for all the receptive fields on the computer screen were depicted as a sheet, it would look like the image on the left. The height on the diagram represents the relative firing rate for each neuron. When the task begins the activity of those neurons increases as the animal tries to perform the task. Then, the first enhancement in activity happens when the animal identifies the edges of objects in the scene. Finally, the second enhancement in activity happens when the animal attends selectively to the correct target.

Why should we care about this study?

Well, for one, it puts the nail in the coffin of the standard model. We had evidence already that the standard model had serious defects, but this study elegantly shows how it is insufficient. The activity in V1 shows clear attributes of features and attention. This suggests that the information flow is not unidirectional in the visual system, but recursive. The higher order visual areas that identify features and organize attention must be communicating back with V1.

This study is also interesting because it gets to the core of how the brain represents visual objects. In essence, the binding problem was a non-issue because the activity associated with an object in the brain is labeled by activation throughout the visual system. At all levels of visual processing including V1, the activation associated with a particular object is modified in tandem. There may be some region that is responsible for coordinating this joint activation, but at least at the level of V1 features are bound together by co-activation.

Hat-tip: Faculty of 1000

Tags

More like this

I am working in vision research, and frankly, I have not seen a single even semi-recent model that does not assume back projection to earlier stages in order to tune their processing - Tsotsos models stem from around 1990, for instance. Your "standard model" as you describe it must have died well before I entered the field.