Google in Your Brain? PageRank As a Semantic Memory Model

The world wide web can be understood as a giant matrix of associations (links) between various nodes (web pages). At an abstract level, this is similar to human memory, consisting of a matrix of associations (learned relationships, or neuronal connections) between various nodes (memories, or the distributed representations constituting them). In the new issue of Psych. Science, Griffiths et al. ask whether Google's famously accurate and fast PageRank algorithm for internet search might behave similarly to the brain's algorithm - whatever that might be - for searching human memory.

About PageRank

The PageRank algorithm is based on the assumption that the most important nodes in a network contain a large number of associations with other nodes, which themselves contain a large number of associations with other nodes, which themselves... and so on. This "recursive definition of importance" is formalized in Google's algorithm to efficiently calculate the rankings of different web pages, and to return those web pages which are mostly highly ranked that also fit a certain search term.

Search in Human Memory

One way of graphing the associative structure of human memory is simply to ask human subjects to generate words which are strongly associated with other words. Averaged across many subjects, the frequency of those generated words reflects the "associate frequency" of the words in human memory. You might think of this result as "MemoryRank" instead of PageRank.

How well does PageRank account for human memory?

Griffiths et al note one critical difference between PageRank and the "associate frequency" measure of human memory: the latter doesn't account for the fact that some cues are strongly associated with more words than others. This is captured by PageRank's more recursive definition of importance.

To evaluate which ranking scheme better predicts human data, the two methods were used on a large set of verbal associations, all generated by humans in response to each of over 5,000 words. The result of this process was two ranks for each word in the set - one generated according to PageRank and one according to associate frequency. This list was then culled to include only those words generated by a set of 50 adults, each of whom had been asked to generate the first word that came to mind in response to each letter of the alphabet (excluding 5 low-frequency letters).

If PageRank or associate frequency were perfect models of human memory, then the human data should be completely predictable: humans should always pick those words which have the highest rank and start with the desired letter.

The result:

"PageRank outperformed both associate frequency and word frequency as a predictor" of those words generated by humans in response to each letter of the alphabet. And this wasn't due merely to the training set - Griffiths et al. manipulated the training set in various ways, and in all cases, PageRank came out on top (relative to associate frequency and word frequency).

What does this mean?

It turns out that PageRank is mathematically equivalent to a large number of other formalisms that are used in cognitive science. For example, severely limited connectionist networks (limited insofar as connection weights are equalized across all projections from a certain node) are mathematically equivalent to PageRank: the activation in such a network should ultimately settle on those nodes in proportion to their PageRank. Likewise, PageRank can also be considered an estimate of "priors" in a Bayesian network (with some simplifying assumptions about likelihood).

So Google's PageRank may accomplish network search in ways that can also be implemented in other frameworks widely used in cognitive psychology. However, PageRank (at least as it is known in the public domain) makes the strongly simplifying assumption that all associations from a particular node equally contribute to the importance of each of the connected nodes.

Although this assumption may be necessary for Google's purposes, it is extremely clear that no such limitation exists in the brain. After all, the most widely recognized algorithm for neural computation - Hebbian learning - works precisely because it modifies the relative weights of one node to another independently from the weights of that node to all other nodes.

Is Google in my brain?

No one is suggesting that Larry Page has discovered the secret to the organization of human memory. In fact, it's clear that some of PageRank's (public) assumptions about the structure of networks do not hold - for example, the idea that the importance of a single node is distributed equally through all its connections. Much better models of verbal processing abound in cognitive psychology (see, for example, LSA). Still, Griffiths et al. compellingly demonstrate that the advantageous qualities of PageRank do indeed generalize from the world wide web to the semantic networks present in the brain.

More like this

Me agrada la noticia de que empresas como Developing Intellig luchen por expandirse por nuestro paÃs, España. Espero que tengan una gran evolución y apuesto que asà sera, el trabajo bien organizado y solido, que dan sus frutos. Invito y animo a que más empresas pierdan el miedo a expandirse por nuestro planeta.

Patricia Gonzalez Vargas
http://www.hotsale.es
Centro comercial online

Google always in my brain. Google is interesting and challenging. It is important to have a page rank because most of the people look on the page rank of the site. For me page rank is just like a reputation of the site.

That's a great article. I've been computing Wikipedia's PageRank as part of my Information Retrieval course. I've got the Irish Wikipedia going right now, in fact. I'm doing it in this language because, due to its size, it seems impossible to compute the English Wikipedia's PR on a standalone machine (e.g., I'd have to write a distributed version.)

Back to the brain, the English Wikipedia has 2 million vertices (articles) and 63 million edges (links). The complexity of computing the PageRank of a graph is O(|H|log(1/e)) floating point operations, where H is the number of edges and e is the precision required, usually taken to be 10-8. The complexity is independent of the number of vertices (Bianchini et al., 2005). This means that I can use a theoretical algorithm to compute the PR of Wikipedia on IBM's BlueGene/P, which can compute .5 quadrillion floating point operations per second, in 1/106 seconds. (it would actually be much faster than this because the distributed versions have decent speedup).

Back to the brain (*ahem*), computing the PR of a semantic network is cute and all, but what about the whole brain? I'll punt on that since I've got good numbers for just cortex: It's O(.15 quadrillion*log(1/10-8)), which is just larger than a quadrillion, which will only take a couple of seconds. Of course, our algorithms aren't perfect - if they were I could crunch these numbers on my desktop in a limited amount of time. But we can make up for it in hardware - BlueBrain anticipates a human scale brain simulation (without relevant connectivity - just scale) within the next ten years. Did I mention that backprop has complexity O(n)? Of course, they are using fancy pants differential equations :)

By Brian Mingus (not verified) on 29 Nov 2007 #permalink

This is really interesting. I am getting lots of fluctuation on my different sites, and some of the pages I am linking to are claiming high PR, when the current Google PR is low or zero. Thanks for the post.