The neural processing of color, shape, and location appears to be widely separated in the brain, and yet our subjective experience of the world is highly coherent: we perceive colored shapes in particular locations. How do these distributed representation about visual features get brought or “bound” together to form an integrated percept?
Charles Gray suggests that there are actually many such binding problems – not only between visual features, but between other sensory modaliities as well as between perception and action. The brain may have evolved redundant mechanisms for solving this widespread problem, although some have noted that it is only really a “problem” for cognitive neuroscientists, since the brain solves it with such apparent ease: the best estimates of binding suggest it occurs around 10ms following the distributed processing of color and orientation.
Regardless of whose problem it is, perceptual grouping is a telling example of the difficulty surrounding binding. Gray characterizes this difficulty with to 3 differing computational requirements. First, visual grouping must be flexible, such that we can recognize a given shape or object from a variety of perspectives; but on the other hand, visual grouping must also be combinatorial, such that neurons representing the contour of the side of a square in some part of the visual field must also be capable of representing that contour when it is part of some other shape; finally, this process must be massively parallel, given that real scenes consist of a large number of such visual features and are processed (by most estimates) within 100-300ms. Gray outlines two hypothetical mechanisms which might be used to satisfy these computational requirements.
The first, hierarchical conjunctive encoding, relies on the established fact that “visual cortex is organized hierarchically into a collection of distinct areas” from the most simple to the more complex. For example, the earliest areas of visual cortex may represent edges (as a conjunction of on and off surround cells). In slightly later areas these are combined into lines (as a conjunction of edges), which are themselves later combined to form corners and contours (as a conjunction of lines). This process is accompanied by increasingly large “receptive field” sizes, in which a given neuron will respond to a particular conjunctive feature in an increasingly large area of visual space as you traverse the hierarchy of visual processing.
But this mechanism has its own problem, according to Gray, known as “combinatorial expansion.” Essentially, any set of neurons representing a square must receive input from every set of 4 possible contours in any possible location of the receptive field. This leads to an uncountably large number of neurons for the representation of a square. Similar problems occur for every other object, leading to what Gray calls “unacceptably large” numbers of connections (although champions of this theory suggest that such combinatorial expansion does occur – according to them, that’s why so much neural tissue is dedicated to visual processing).
Gray implies that hierarchical conjunctive encoding might operate in parallel with another mechanism for grouping particular features – temporal population coding. According to this proposal, the neurons which represent “square” do so as a temporally-organized configuration of firing patterns, distributed across many levels in the visual processing hierarchy. Because there are necessarily more possible combinations of activity than there are neurons, this alleviates the combinatorial expansion problem.
Unfortunately, another problem emerges – what Gray calls the “superposition problem” – which concerns how the brain can parse a visual scene with more than one item. The proposed solution to this is precise spike timing, spike-timing dependent plasiticity (a prediction that was biologically verified in 1994, 13 years after the proposal), and the existence of “horizontal fibers” or lateral excitatory connections among neurons in visual cortex with similar receptive field properties (another prediction which was firmly established only after publication of the original theory). According to these accounts, the visual system represents different objects oscillations between neuronal assemblies in sensory areas which fire slightly out of phase with one another, with respect to a more global ongoing oscillation. This is sometimes known as the “multiplexed synchrony” or “cross-frequency phase coupling” hypothesis.
These two theories differ in some important ways. For example, whereas the multiplexed synchrony hypothesis implies an important role for attention, hierarchical conjunctive encoding processes may occur pre-attentively. This latter supposition is compatible with evidence from Allen, Baddeley & Hitch that binding between color and shape appears to be automatic – in other words, it does not require the kinds of cognitive resources that are deployed in other tasks. Yet a new paper by Hollingworth suggests that object-position binding is sensitive to spatial manipulations. This, in turn, is consistent with evidence reviewed earlier this week that both deficits and “hyperactivity” of binding (in Balint’s syndrome and synaesthesia, respectively) rely on the parietal lobe, a region that is known to be important for spatial processing.
Hollingworth explicitly suggests that the hippocampus might be responsible for this form of binding, another region of the brain that is thought to be sensitive to spatial manipulations, and he is not alone in implying the involvement of the hippocampus in visual binding. However, given the evidence reviewed earlier this week from Balint’s and synaesthesia, parietal cortex seems like a better contender for accomplishing binding.
Notably, parietal cortex has also been argued to represent “stimulus-response” mappings in task-switching paradigms, suggesting an even more general role for this region in arbitrary maintenance and binding (see here for even more on this). In other words, parietal cortex may maintain the currently-relevant patterns of stimuli and associated responses, and reorganize them appropriately when tasks change. So this region seems important not only for within-modality binding (as between color and shape, for example) but also for binding across perception and action (as in task-switching paradigms).
Multiplexed syncrhony or cross-frequency phase coupling might also have a role in parietal representations. Gamma synchrony is prevalent at parietal electrodes in some target detection tasks, and may be multiplexed by theta oscillations when working memory is engaged. Likewise, phase synchrony between parietal and visual areas has been proposed as a central mechanism for “active, attentive behavior.”
In summary, there are many reasons to suppose that both multiplexed synchrony and hierarchical conjunctive encoding are at work in binding. In this way, the brain does seem to have developed multiple mechanisms for solving the widespread binding problem. Somewhat surprisingly, the parietal lobe may have a central role in this process, despite the multiplicity of the the problem (whether binding sensations within or across modalities, or binding perception to action) which might lead one to suspect a more heterogenous network of binding-related areas.