Exploration & Exploitation Balanced by Norepinephrine & Dopamine

By developinginte… on February 12, 2007.

Whereas yesteryear's artificial neural networks models were focused on achieving basic biological plausibility, today's cutting edge networks are modeling cognitive phenomena at the level of neurotransmitters. In a great example of this development, McClure, Gilzenrat & Cohen have an article in Advances in Neural Information Processing Systems where they propose a role for both dopamine and norepinephrine in switching behavior between modes of "exploration" and "exploitation."

First, a little background. In artificial intelligence circles, the "temporal difference" algorithm has been a well-known method for simulating reinforcement learning. Exciting advances in our understanding of midbrain dopaminergic nuclei have demonstrated that something very similar is actually being computed by the brain. As McClure et al. note, dopamine seems to be released as a function of how wrong the "predicted reward" of a given stimulus was: if you had vastly underestimated the reward you later receive, dopamine is released in larger quantities; likewise, if you had overestimated the reward you would later receive, dopamine release dips below its usual level.

Unfortunately, a system that relies only on reinforcement learning is purely exploitative. In other words, as soon as it finds something rewarding, it will continue to seek out that rewarding stimulus to the exclusion of all other novel things (some which could be even more rewarding!) To solve this dilemma, McClure et al. propose that tonically higher levels of norepinephrine (i.e., noradrenaline) may encourage more exploratory behavior.

The basic mechanism is that norepinephrine release has two basic modes: phasic and tonic. The phasic mode involves transient increases in norepinephrine, which facilitates processing. In tonic mode, however, the overall levels of norepinephrine are higher, which results in "unpredictable" (i.e., explorative) behavior.

What causes the "switch" between these two modes of norepinephrine release? McClure et al. suggest that the anterior cingulate cortex (ACC) may direct noradrenaline release in the locus coeruleus (LC). ACC is sensitive to conflict (i.e., when there are multiple competing stimuli or responses), and when active, will nudge LC into tonic mode. Once a reward has been achieved, dopamine-related reinforcement learning processes (such as temporal difference learning) will tend to strengthen the rewarding response, thereby decreasing the amount of conflict between this response and other possible but unrewarded responses. This lack of conflict decreases the activity in ACC, which then causes LC activity to return to its default phasic mode.

The authors implemented their hypothesis in a neural network model, fitted to data from monkeys on a simple task. Monkeys were reward for responding to one of two stimuli, and punished for responding to the other; this stimulus-reward mapping was sometimes reversed, after which LC neurons initially elevated their firing rate (i.e., transitioned to the tonic mode) and then eventually returned to a lower firing rate with transient bursts in activity (i.e., transitioned back to the phasic mode).

McClure et al. suggest that this model begins to solve the "exploration-exploitation" dilemma of intelligent agents: how do you know when to continue with your current behaviors, and when to seek out other possibilities? The fact that this solution involves norepinephrine is interesting, insofar as a similar model of dopamine release (also by Jon Cohen, summarized here) is claimed to solve the "flexibility-stability" dilemma.

The "stability-flexibility" dilemma refers to the fact that it is efficient to be able to limit your focus and actively maintain only currently-relevant stimuli - called "stability" because you are unlikely to be distracted. But this has a risk: when you need to switch tasks, this "attentional inertia" incurs a cost in terms of flexibility. Phasic and tonic dopamine release is thought to mitigate this dilemma, in that the tonic dopamine release is associated with increased maintenance, whereas phasic bursts in dopamine release is associated with "updating" new information into that otherwise stable active maintenance system.

In summary, the temporal difference algorithm had been recognized as an efficient method of reinforcement learning, but it was always associated with a cost: once a rewarding stimulus is found, it becomes the focus of attention at the expense of more exploratory behavior. Recent work in cognitive neuroscience has demonstrated how the temporal difference algorithm may be neurally implemented by dopamine fluctuations, and the McClure, Gilzenrat & Cohen paper reviewed in this post describes how a different neurotransmitter system is used to solve the exploration-exploitation dilemma of temporal difference learning.

More like this

Dopamine for Dummies

Dopamine is probably the most studied neurotransmitter, and yet the neuroscience literature contains a huge variety of perspectives on its functional role. This post summarizes a systems-level perspective on the function of dopamine that has motivated several successful drug studies and informed…

Exploration, Reinforcement, and Updating in ADHD

How do the symptoms of ADHD relate to the circuitry underlying executive function and working memory? An in-press article at Neuropsychopharmacology investigates the roles of dopamine and norepinephrine in ADHD, with evidence from both behavioral and simulated experiments. This post will make…

ACC: Monitoring Conflict or Response Frequency?

According to some perspectives, anterior cingulate cortex (ACC) may become activate in situations where the reward value of given representation or stimulus has decreased, resulting in more competition between representations. Activation of this region may help increase tonic norepinephrine,…

Tonic Dopamine and Response Variability

In a 2006 Psychopharmacology article, Niv et al. suggest that while transient dopamine release is frequently modeled computationally (as encoding reward-prediction error, for example, or as gating information into working memory) the role of more constant dopamine release is not. In the…

Been reading a bunch of your posts lately -- really incredible writing. Overall, I've been impressed which is why I was confused when you stated: "norepinephrine (i.e., adrenaline)" and "the anterior cingulate cortex (ACC) may direct adrenaline release in the locus coeruleus (LC)."

I'm assuming with both of these you just confused the words adrenaline with noradrenaline (norepinephrine = noradrenaline and epinephrine = adrenaline).

Thanks for the correction! Fixed now.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Performance Improves with Transcranial Random Noise Stimulation

November 21, 2011

Stimulating the brain with high frequency electrical noise can supersede the beneficial effects observed from transcranial direct current stimulation, either anodal or cathodal (as well as those observed from sham stimulation), in perceptual learning, as newly reported by Fertonani, Pirully &…

Attractors All the Way Up: Metastability, Rostrocaudal Hierarchies, and Synaptic Facilitation

November 18, 2011

In their wonderful Neuroimage article, Braun & Mattia present a comprehensive introduction to the possible neuronal implementations and cognitive sequelae of a particular dynamical phenomenon: the attractor state. In another excellent paper, just recently out in Frontiers, Itskov, Hansel and…

Architecture of the VLPFC and its Monkey/Human Mapping

November 17, 2011

If you ever said to yourself, "I wonder whether the human mid- and posterior ventrolateral prefrontal cortex has a homologue in the monkey, and what features of its cytoarchitecture or subcortical connectivity may differentiate it from other regions of PFC" then this post is for you. Otherwise,…

Modus Tollens, Modus Shmollens! When people commit a fallacy so absurd that it's only recently been given a name.

November 16, 2011

Suppose - rather reasonably - that soups which taste like garlic have garlic in them. You observe two people eating soup; one of them says to the other, "There is no garlic in this soup." Do you think it's likely that the soup taste like garlic? If you said yes, then congratulations! You've just…

Greater Performance Improvements When Quick Responses Are Rewarded More Than Accuracy Itself.

November 8, 2011

Last month's Frontiers in Psychology contains a fascinating study by Dambacher, HuÌbner, and SchlÃ¶sser in which the authors demonstrate that the promise of financial reward can actually reduce performance when rewards are given for high accuracy. Counterintuitively, performance (characterized as…