Predictive Nature: Externalizing Supervised Learning

Geoff Hinton has a new TiCS paper describing recent advances in algorithms used to train multilayered neural networks.

First, a little background: neural networks of a sufficient size can calculate any mathematical function (an infamous proof among neural network modelers). Unfortunately, the tricky part is figuring out how to set the connections in that network to calculate those functions.

This is where learning algorithms become necessary - unless you want to tweak each connection by hand until you get a working network (not a problem if you don't care how the brain works), then you need to focus on how the network can learn.

Hebbian learning is a standard algorithm that does seem to operate in biological neural networks, but it has a problem: it's not very good for training deep networks (those networks which have multiple "hidden layers," i.e., networks where only a small portion of the units receive input from outside the network). In the 1980s, a new learning algorithm was developed which could overcome these limitations - known as backpropagation of error, or just "backprop."

Unfortunately, this too has its problems; backprop relies on a teaching signal to tell the network when it's right and when it's wrong (known as supervised learning). Some have criticized this form of learning as being too unrealistic - clearly, say the skeptics, humans are able to learn on their own without some "omniscient teacher" in the background.

In Hinton's new paper, he describes an interesting way around these problems. The first step is to use different connections for information coming into the network (bottom-up) than connections going back from the upper ("inner" or "cognitive") layers of the network back out to lower ("outer" or "sensory") layers. Hinton calls these "recognition" and "generative" connections, respectively.

These generative connections are used to predict sensory data. Thus, a network which perfectly "understands" the world it inhabits should always generate sensory data which correspond to the "average" data it has observed over the course of training. Hinton points out that the difficulty lies in estimating that average distribution, but goes on to describe an framework which seems to satisfy these requirements.

The first step involves training a series of simple 2-layer networks with hidden layer units which are not connected to one another (known as "restricted Boltzmann machines", or RBMs) to reproduce its own training images. Once one RBM is trained, another is trained on the first, using the hidden units of the first RBM as the input units for the second, and so on until the desired level of accuracy is reached.

The second major step is to apply the "wake-sleep" algorithm, a form of backpropagation of error, to fine-tune the networks to be better at discrimination of data (since the training of RBMs is geared towards creating a system that is better at generating the data).

The end result is a learning mechanism which does not require labeled input data or an "omniscient teacher," but instead gets error signals from its ability to predict its own input. Although this approach does not include factors like lateral inhibition, and lateral excitatory connections among units in the hidden layer, it seems like an interesting advance in unsupervised learning algorithms.

More like this