How do neurons in your brain encode the diversity of stimuli present in the world? This is one of the questions that neuroscientists have to answers about how the brain works. The world holds an infinite array of things to see, hear, touch, etc., yet your brain only has a finite number of neurons to encode them. How is this infinite diversity assimilated by a machine with finite components?
To address this issue, I want to talk about Hromadka et al. publishing in the journal PLoS Biology. Hromadka et al. perform electrical recordings in the auditory cortex of unanesthetized rats. (The significance of it being unanesthetized we will talk about in a second.) They played different sounds for the rats and recorded the responses in many different neurons to those sounds.
They found that only a small number of neurons from their sample responded to any particular sounds. This means that the representation of sounds is "sparse" as opposed to "dense." I will define these terms for you.
One way to think about how the brain encodes the bewildering variety of things out in the world is to consider the following paradox: the grandmother neuron. The grandmother neuron is a concept originated by Jerry Lettvin. Basically it works like this. Your sensory systems break stimuli down into specific features of the stimulus. Your visual system breaks down things into lines and colors and movement, for instance. But in order for your brain to perceive these features as unitary objects, the activity originating in the feature neurons must eventually feed back to the same place. (This is called the binding problem -- namely how features that are separated by the sensory systems bound together to form a single representation of an object.)
One way to solve the binding problem is by bringing all of the feature neurons together to activate one neuron. However, if all the features from one stimulus feed back to a single neuron, it would stand to reason that somewhere in your brain you have a grandmother neuron -- a neuron that is only active when you perceive your grandma.
Of course, organizing the system this way would lead to some farcically odd consequences. First, you have a limited number of neurons in your brain, but there are an infinite number of stimuli to encode. If only one is used for each stimulus, you would run out of neurons. Second, what if you lost that neuron? Would you no longer be able to remember your grandmother? Clearly, the idea that you would have a single neuron that encodes each object isn't going to work.
The solution to the grandmother neuron problem is an idea called ensemble encoding. Ensemble encoding means that you encode different objects in a set rather than a single neuron. Say you have a billion neurons in your visual cortex. Maybe 1,000 or a million or whatever would be activated whenever you see your grandmother. But each of the members of that set could be active in encoding other things. Ensemble encoding codes different stimuli combinatorially, allowing a significantly larger number of things to be encoded. (The number of combination of 1,000,000 units taken 1,000 at a time is a very large number.) Also, because there is a set of neurons being active, the loss of any particular neuron is not a big deal. Say you lost 1 out of the 1,000 neurons associated with the grandmother representation. That wouldn't be such a big deal because the remainder of the representation is still there.
As an aside, periodically you hear these totally misinterpreted stories in the news about a Halle Berry neuron found is X brain area. The misinterpretation here is not that neuron doesn't activate selectively in response to Halle Berry. That may be true. The misinterpretation is that it is the only neuron activated by Halle Berry or that only Halle Berry activates the neuron. The purpose of the neuron is not to encode Halle Berry. The purpose of the neuron is to encode a class of stimuli as part of an ensemble.
Ensemble encoding brings up another interesting question though: how many neurons are involved in representing each stimuli? I am going to call this the encoding density. Neuroscientists use two terms to describe encoding density. Sparse encoding is when the stimulus is encoded by a large change in activity in a small number of neurons. If you have a billion neurons and only 5 are activated by a stimulus, then we would say that stimulus is sparsely encoded. On the other hand, when the stimulus is encoded by a small change in activity in a large number of neurons, we say it is densely encoded. These terms -- sparse vs. dense -- speak (generally) to how tightly tuned a neuron is for particular stimuli. If a particular neuron activates to a wide range of stimuli, it isn't tightly tuned. This is the case in dense encoding. On the other hand, tuning could be tight. The neuron might respond to only two or three different stimuli. Generally, the degree of tuning is inversely correlated with encoding density. We will see that this isn't always true, and a counterexample is posed in this paper.
Another aside: If a neuroscientist were to ask, "what does a brain region encode?" sparse encoding is generally a bit easier to deal with than dense encoding. Remember that when you are doing experiments of this nature you only can record from a subset of neurons in a particular brain region. There are far too many to record them all. When you are dealing with a region that encodes things sparsely, the presence of absence of a stimulus causes a large change in neuronal activity. Thus, any large change in activity tends to suggest that whatever stimulus you were using is the one the brain regions encode. Dense encoding can be troublesome. Because dense encoding involves neurons that are promiscuous in what they respond to -- they are broadly tuned -- you are never quite sure whether the stimulus you are using is really what is encoded in that brain region.
An example of this problem is research that attributes prefrontal neurons to the processing of faces. (Here is an example.) It is true that neurons in the prefrontal cortex respond to faces, and it is true that these face-selective neurons are anatomically segregated from other neurons in the prefrontal cortex. However, prefrontal cortical neurons are broadly tuned; they respond to a lot of stuff. Furthermore, lesions to the prefrontal cortex do not typically produce deficits in face perception. My point is here is to illustrate the problem of over-interpretation when you are dealing with dense encoding. Brain regions that show dense encoding respond to a wide variety of stimuli, but a response does not necessarily mean that region exists to encode the particular stimulus you are using.
Hromadka et al.
Now that we have our terminology and background, let's look at Hromadka et al.. The authors wanted to find out whether sounds are encoded sparsely or densely in the auditory cortex of unanesthetized rats. Whether or not the rats are anesthetized does matter. The drugs used in global anesthesia can sometimes change the network properties of a particular brain region. This means that experiments done with or without anesthesia can sometimes yield divergent results.
To do this, they mounted the heads of rats in a stereotactic frame and recorded from neurons in the auditory cortex. They surveyed the responses of each recorded neuron to a set of sounds including pure tones, complex noises, and natural sounds. As opposed to the more standard way of recording neuronal activity using a metal electrode, the researchers in this paper use a glass electrode. This is important because when you use a metal electrode, you identify the neurons by measuring activity: you have to see that a neuron is spiking to know that it is there. Using an electrode on the other hand, you can know that you are in a neuron by measuring the field potential (more negative inside the cell). This frees this paper from a bias that papers using metal electrodes might have. Metal electrodes tend to underestimate the number of inactive neurons; the use of a glass electrodes does not.
After surveying the response of many neurons to different sounds, the authors conclude that activity in the auditory cortex is sparsely coded. This is illustrated in the figure below (Figure 6 from the paper, click to enlarge).
The top row indicates the fraction of neurons that responded to particular types of stimuli. Note that increases the volume of the stimulus (increased dB) does not increases this proportion greatly. The bottom row is a histogram of the probability that a particular neuron will change its firing rate (in spikes per second) by a given degree. See how it is most probable that the neuron will not change its firing rate at all.
These results strongly suggest that the auditory cortex is sparsely coded. Neurons generally only change their firing rates in response to a narrow range of stimuli with the majority of neurons remaining unchanged.
There are some interesting complexities in the data, however. The first is that some of the neurons in their sample are broadly tuned: they respond to a broad range of stimuli. This would mean that the neurons would violate the principle that I described earlier -- that sharp tuning is associated with sparse representations. The authors explain this anomaly thusly:
Half of the cells (50%) did not show any significant change (increase or decrease) in firing rate during any response epoch, to any stimulus; an example of such an unresponsive neuron was shown in Figure 2H. At the other extreme, a few broadly tuned cells showed significant changes in firing rate in all (four or five) octave bins (i.e., across the whole frequency space tested) for at least one of the response periods.
It might appear that the sparseness we report is incompatible with the broad frequency tuning of rat auditory cortical neurons. However, we found that sparseness was not achieved through narrow frequency tuning. Instead, it arose through a combination of factors. First, 50% of the neural population failed to respond to any of the simple stimuli we presented. Second, responses were often brief; in many neurons, the change in firing rate was limited to just one of the three response epochs. Thus, sparseness of the response in time contributed to the overall sparseness of the population response. Finally, even when changes occurred they were typically small; the increase in firing rate exceeded 20 sp/s in only about a quarter of the statistically significant responses. As a result, only a small fraction of neurons responded vigorously to any tone even though frequency tuning was broad.
What this means is that while most neurons do not respond to a particular stimulus -- suggesting spare encoding -- there are a fraction that are promiscuous responders -- broadly tuned neurons. The authors speculate that the promiscuous responders might actually be a different type of neuron. In contrast to pyramidal cells -- which are mostly excitatory -- the promiscuous cells might be inhibitory interneurons. This subset seem to violate the tight tuning-sparse coding rule I discussed earlier:
Although definitive identification of interneurons requires other techniques such as morphological reconstruction, it is likely that majority of highly responsive cells in our sample were not excitatory pyramidal neurons. We speculate that the high responsiveness of inhibitory interneurons might contribute to population sparseness of stimulus-evoked responses by simply inhibiting responses of pyramidal neurons in the auditory cortex. Such inhibition could then lead to sparse communication between the primary auditory cortex and higher sensory cortical areas in awake animals.
Complexities aside, revealing that the auditory cortex encodes stimuli sparsely has important implications. According to some theories, sparse coding is more energy efficient than dense coding. This would explain why we see sparse coding in other areas of the brain besides the auditory cortex. The authors talk about the significance of their work:
The population sparseness in the awake auditory cortex we described arose through a combination of three factors. First, half of neurons failed to respond to any tone we presented. Second, responses were often brief. Third, the amplitude of responses was usually low. Thus, even though the frequency tuning of single neurons is usually broad, only a small fraction of neurons responded vigorously and most neurons were silent.
Experimental evidence for sparse coding has been found in a range of experimental preparations, including the visual, motor, barrel, and olfactory systems, the zebra finch auditory system, and cat lateral geniculate nucleus. However, the sparseness of representations in the auditory cortex has not been explicitly addressed in previous work. Our results constitute the first direct evidence that the representation of sounds in the auditory cortex of unanesthetized animals is sparse.
Our data support the "efficient coding hypothesis," according to which the goal of sensory processing is to construct an efficient representation of the sensory environment. Sparse codes can provide efficient representations for natural scenes. Sparse representations may also offer energy efficient coding, where fewer spikes are required compared to dense representations. (Emphasis mine. Citations removed.)
For the reasons that I discussed above related to the grandmother neuron, there appears to be a certain trade-off going on in encoding density. On the one hand, you want things to be as sparse as possible because activating fewer neurons is energy efficient. On the other hand, if you encode using too few neurons, you loose out of coding complexity and the grandmother neuron paradox starts to come into play. It has also been argued that sparse representations are easier to identify and hence to learn than dense representations. If you are a higher order brain region attempting to decipher a stimulus in a lower order brain region, big changes in activity in a small number of neurons is easier to recognize than small changes in activity in a large number of neurons. With dense coding, you get the added issue that changes in activity may difficult to resolve from baseline activity.
Hat-tip: Faculty of 1000
Hromadka, T., DeWeese, M.R., Zador, A.M. (2008). Sparse Representation of Sounds in the Unanesthetized Auditory Cortex. PLoS Biology, 6(1), e16. DOI: 10.1371/journal.pbio.0060016
Maybe if my med school lectures were presented in blog-format, I would be reading them instead of procrastinating by reading this! :)
Even for steady auditory tones, it seems to me there may be many possible encoding schemes that a simple delta-rate-of-fire measurement might fail to usefully elucidate.
Has it been previously established that modulation of rate of fire is the primary information carrying channel in this case?
I don't think that it has been definitively established melior.
The problem that you have with encoding in the auditory system is that sometimes you have to encode sound frequencies with higher oscillation rates than the maximum rate of neuron firing. Think of trying to encode a 1,000 Hz sound. A neuron can't fire that fast.
Lower order auditory systems like those in the cochlea and the cochlear nucleus solve this problem by phase locking with low frequency sounds and selectively activating neurons for high frequency sounds. Whether or not this partition is maintained into the auditory cortex, I really don't know.
I do know that the primary auditory cortex is tonotopically organized which suggests to me that it is less rate of fire and more which neurons are firing that is critical.
wait, so what does this have to do with the cocktail party effect? you lost me.
Great article! Mac's comment is so true. How come since like this are so fun and easy to read and understand that reading them feels like a break from work, journals themselves aren't like this?
One question though - you don't say a whole lot about what the sounds used actually are. I'm no expert on this at all but I've done a bit of reading on human phonetic perception and there's some reasonable evidence that the basic feature-detecting elements of whatever system does this are changed by experience. Basic frequency contrasts that are useful to distinguish meaningfully-different sounds become more senstive, while those that aren't become less sensitve and dulled - part of the reason that foriegn accents are hard to undestand. Also, context and previously-learnt categories and combinations seem to influence even initial, basic-feature level encoding.
Is it possible that rats' coding may be less sparse than appeared here with sounds that have meaning and that appear in a more natural context? I don't know for sure but this study looks like it might be using only a small part of the auditory perception system and a part which, in nature, would very rarely be used in isolation.
Or, put another way, that encoding is sparse when it is of unfamiliar, unmeaningful sounds, using frequencies to which senstivity in fine discrimination has never developed, with no context information to relate these sounds to - and that this doesn't neccessarily generalise to natural, ecologically normal sound perception?
You're asking the million dollar question, Alan, namely do the experiments that we are doing translate into natural environments? This is a fundamental question in neuroscience because it gets into whether analysis is effective at dissecting these systems. We would very much like to assume that a natural sound is merely the sum of simple sounds, but that assumption may or may not be true.
This is an area where I don't know very much, but what I can tell is that people do ask whether it would be more efficient to encode natural sounds. This would be in effect say, activity ensemble A is equal to bird call 1 rather than saying activity in ensembles A,B,C... which represent components of 1 equal bird call 1.
Also there is a recognized evolutionary benefit to responding to certain natural sounds specifically. For example, bird have mating calls. These mating calls can and should be specifically encoded. Most of the research I know about in so-called natural sounds goes on in song birds. They look for ensembles that respond to sound features. There is evidence that humans have natural sounds as well. For example, our hearing is tuned to be sensitive in the range of speaking sounds (obviously). But it also turns out that most music sounds are in the speaking range as well. Humans may have a particular subset of sounds that they respond well too. (This isn't even getting into the subject of word sounds -- phonemes. The special perception of speech has a whole field associated with it.)
The subject is complicated, but people are certainly looking into natural sounds.