Predictive Nature: Externalizing Supervised Learning

By developinginte… on November 19, 2007.

Geoff Hinton has a new TiCS paper describing recent advances in algorithms used to train multilayered neural networks.

First, a little background: neural networks of a sufficient size can calculate any mathematical function (an infamous proof among neural network modelers). Unfortunately, the tricky part is figuring out how to set the connections in that network to calculate those functions.

This is where learning algorithms become necessary - unless you want to tweak each connection by hand until you get a working network (not a problem if you don't care how the brain works), then you need to focus on how the network can learn.

Hebbian learning is a standard algorithm that does seem to operate in biological neural networks, but it has a problem: it's not very good for training deep networks (those networks which have multiple "hidden layers," i.e., networks where only a small portion of the units receive input from outside the network). In the 1980s, a new learning algorithm was developed which could overcome these limitations - known as backpropagation of error, or just "backprop."

Unfortunately, this too has its problems; backprop relies on a teaching signal to tell the network when it's right and when it's wrong (known as supervised learning). Some have criticized this form of learning as being too unrealistic - clearly, say the skeptics, humans are able to learn on their own without some "omniscient teacher" in the background.

In Hinton's new paper, he describes an interesting way around these problems. The first step is to use different connections for information coming into the network (bottom-up) than connections going back from the upper ("inner" or "cognitive") layers of the network back out to lower ("outer" or "sensory") layers. Hinton calls these "recognition" and "generative" connections, respectively.

These generative connections are used to predict sensory data. Thus, a network which perfectly "understands" the world it inhabits should always generate sensory data which correspond to the "average" data it has observed over the course of training. Hinton points out that the difficulty lies in estimating that average distribution, but goes on to describe an framework which seems to satisfy these requirements.

The first step involves training a series of simple 2-layer networks with hidden layer units which are not connected to one another (known as "restricted Boltzmann machines", or RBMs) to reproduce its own training images. Once one RBM is trained, another is trained on the first, using the hidden units of the first RBM as the input units for the second, and so on until the desired level of accuracy is reached.

The second major step is to apply the "wake-sleep" algorithm, a form of backpropagation of error, to fine-tune the networks to be better at discrimination of data (since the training of RBMs is geared towards creating a system that is better at generating the data).

The end result is a learning mechanism which does not require labeled input data or an "omniscient teacher," but instead gets error signals from its ability to predict its own input. Although this approach does not include factors like lateral inhibition, and lateral excitatory connections among units in the hidden layer, it seems like an interesting advance in unsupervised learning algorithms.

More like this

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Communism V. Journalists: Beijing’s Crackdown on Press Freedom

More by this author

Performance Improves with Transcranial Random Noise Stimulation

November 21, 2011

Stimulating the brain with high frequency electrical noise can supersede the beneficial effects observed from transcranial direct current stimulation, either anodal or cathodal (as well as those observed from sham stimulation), in perceptual learning, as newly reported by Fertonani, Pirully &…

Attractors All the Way Up: Metastability, Rostrocaudal Hierarchies, and Synaptic Facilitation

November 18, 2011

In their wonderful Neuroimage article, Braun & Mattia present a comprehensive introduction to the possible neuronal implementations and cognitive sequelae of a particular dynamical phenomenon: the attractor state. In another excellent paper, just recently out in Frontiers, Itskov, Hansel and…

Architecture of the VLPFC and its Monkey/Human Mapping

November 17, 2011

If you ever said to yourself, "I wonder whether the human mid- and posterior ventrolateral prefrontal cortex has a homologue in the monkey, and what features of its cytoarchitecture or subcortical connectivity may differentiate it from other regions of PFC" then this post is for you. Otherwise,…

Modus Tollens, Modus Shmollens! When people commit a fallacy so absurd that it's only recently been given a name.

November 16, 2011

Suppose - rather reasonably - that soups which taste like garlic have garlic in them. You observe two people eating soup; one of them says to the other, "There is no garlic in this soup." Do you think it's likely that the soup taste like garlic? If you said yes, then congratulations! You've just…

Greater Performance Improvements When Quick Responses Are Rewarded More Than Accuracy Itself.

November 8, 2011

Last month's Frontiers in Psychology contains a fascinating study by Dambacher, HuÌbner, and SchlÃ¶sser in which the authors demonstrate that the promise of financial reward can actually reduce performance when rewards are given for high accuracy. Counterintuitively, performance (characterized as…