Google Predicts Memory, and Probably Everything Else

There's a paper in the December 2007 issue of Psychological Science titled "Google and the Mind: Predicting Fluency With PageRank." Here's the abstract:

Griffiths, T.L., Steyvers, M., & Firl, A. (2007). Google and the mind: Predicting fluency with PageRank. Psychological Science, 18(12), 1069-1076.


Human memory and Internet search engines face a shared computational problem, needing to retrieve stored pieces of information in response to a query. We explored whether they employ similar solutions, testing whether we could predict human performance on a fluency task using PageRank, a component of the Google search engine. In this task, people were shown a letter of the alphabet and asked to name the first word beginning with that letter that came to mind. We show that PageRank, computed on a semantic network constructed from word-association data, outperformed word frequency and the number of words for which a word is named as an associate as a predictor of the words that people produced in this task. We identify two simple process models that could support this apparent correspondence between human memory and Internet search, and relate our results to previous rational models of memory.

I don't really have anything to say, and you can probably gleam everything you need to about the methodology from the abstract, though if you want to delve deeper you can read the paper for free here. If you're wondering what they conclude about rational models of memory (ala John Anderson), it's pretty simple. Rational models of memory posit that retrieval is a statistical, and specifically, Bayesian process, and so is an internet search, so human memory and internet search may work in similar ways. I don't really know enough about internet searches to venture a guess as to how similar they might be to human memory, but I suspect there are important differences even in the statistical processes involved. Anyway, here's the meat of the paper's conclusion:

The relationship between PageRank and fluency reported in this article suggests that the analogy between computer-based solutions to information retrieval problems and human memory may be worth pursuing further. In particular, our approach indicates how one can obtain novel models of human memory by studying the properties of successful information-retrieval systems, such as Internet search engines. Establishing this correspondence is important not just for the hypotheses about human cognition that may result, but as a path toward developing better search engines.

More like this

Yo! What's crackin'? This research is like Boo Yah!

It sure is cool to put "Google" in the title of any article, and especially an article on how the mind works. In general I am big fan of Psych Science, but it often publishes flash over substance.

The basic problem with semantic networks (e.g.,Google's PageRank) is that they are NOT built to represent meaning. So, if we are going to understand how the mind works by looking to Google's PageRank, good luck.

There are many criticisms of semantic network and some of them are quite brutal (see Fodor, Pylyshyn, 1988), but barring outrageous counter-arguments, Phillip Johnson-Laird (1984) offers a nice critique in what might be called a "classic" paper. Here is a nice quote from that paper that should help one digest the problem with this class of representational models.

One of the things that you know about poodles is that they are dogs, and this sort of information can be represented in a semantic network. However, you also have some knowledge about what it is for something to be a poodle; you have a concept of what poodles are, and this knowledge enables you to identify poodles, to establish complex intensional relations involving them, and to verify assertions about them. Of course, no one can definitively classify any entity as either a poodle or not a poodle: There are probably no necessary and sufficient conditions for poodlehood (see Putnam, 1975), but without some knowledge of what determines the extension of a term you can hardly be said to have grasped its meaning. This sort of knowledge is not represented in semantic networks, and there is no immediate way in which it could be represented because networks lack connections to representations of the world. They only provide connections between words. Unfortunately, they cannot even give a complete explanation of intensional phenomena, because a proper account of ambiguity, anomaly, instantiation, and inference turns out to depend on access to extensional representations.� P. 313

Only Connections: A Critique of Semantic Networks, Johnson-Laird, Hermann, Chaffin, Psychological Bulletin, 1984, 96, 2, 292-315.

By john dennis (not verified) on 08 Jan 2008 #permalink

In response to john dennis, (a) even though semantic networks aren't built to represent meaning, meaning begins to come from them in the sense that most (definitely not all, I get that) links occur because the linked page has some sort of meaningful association with the linking page. Your page is an excellent example: link to the article because you're talking about it, link to your archives because they're what you wrote, etc. At a quick glance, probably the least meaning-full links on the page are the ads, and even those have a passing acquaintance with the general topic of science.
(b) While a very large percentage of human memory connections are meaning-full, not all of them are. For example, the meaning of a Spoonerism has nothing to do with the original phrase, the two are just associated by sound patterns.

In response to ceresina:

a) Semantic networks and the larger category of connectionist networks are in some sense just like links on an internet page, so the analogy to Google's Page Rank is a good one. Links in both are:

not generalized
pointers that can be made to take on different functional significance by an independent interpreter, but are confined to meaning something like �sends activation to�. The intended interpretation of the links as causal connections is intrinsic to the theory. If you ignore this point, you are likely to take Connectionism to offer a much richer notion of mental representation than it actually does.

Connectionism and Cognitive Architecture, A Critical Analysis, Fodor, Pylyshyn

b) I don't get this. Both in the largest, the larger meaning and the factual meaning of ceresina's comment.

First, the factual meaning. Spoonerisms were named after William Archibald Spooner who had a speech deficit where he would switch consonants, vowels and phonemes in spoken speech. And so a spoonerism is just that. For example, Spooner once said, *You'll soon be had as a matter of course* instead of *You'll soon be mad as a Hatter of course*. Interestingly, spoonerisms (exchanges) tell us something about the serial organization of speech because only some types of spoonerisms are possible. In fact, the professor I have TAed for for the past 4 years, Dr. Peter MacNielage, has included Spoonerisms in his influential theory on the evolution of language.

Second, the larger meaning. The etymology of words somehow destroys meaning is just wrong. The process of tracing a word and its cognates to the earliest occurrence of it actually provides more meaning to the word. Lexicographers will be the first to tell you so. Etymology often shows you the power of metaphor, like the root of the fork in the river is the physical pitchfork, same to for the expression fork it over. George Lakoff, Mark Johnson and a cast of thousands have all talked extensively about the etymology of words and the metaphorical nature of language (Lakoff, George & Johnson, Mark (1980) Metaphors We Live By. Chicago: University of Chicago Press).

Third, the largest meaning. How can you have a meaningless memory connection? I have tried for at least 5 minutes to dissect this but I get nothing. Memory connections are always meaningful, indeed, that is what makes the connections possible.

By john dennis (not verified) on 09 Jan 2008 #permalink

Establishing this correspondence is important not just for the hypotheses about human cognition that may result, but as a path toward developing better search engines.

It would seem that further research could be good for both man and machine. Even though we created the machines it is obvious every day that they don't "think" exactly the way we do.
Dave Briggs :~)

John Dennis: I don't understand your arguments, but I suspect it's because I misunderstood your original use of the word "meaning." I thought you were referring to a definitional meaning, that is, to the idea of what a word means, in the general-public sense of the word "means." You can go to a dictionary & get an idea of what a word means, and the meaning will be refined as you interact with it -- sort of what I'm trying to do as I'm describing what I mean by "mean." What makes me think this is that, to me, your first comment says that semantic networks don't represent meaning, but your second implies they do, especially "causal connections."
Following up, therefore, a Spoonerized phrase is an example of a "meaningless" connection because it is not connected by definitional meaning to the original phrase. Since I can't think of any off the top of my head, I'll use Wikipedia's first example: "The lord is a shoving leopard." The only connection from "loving shepherd" to "shoving leopard" is the sounds, not the definition. This naturally means I don't understand what you're responding to with the etymology point; I'm in agreement with you on it. And I also agree that of course there's always a connection between ideas, even the most seemingly random non-sequitur; it just isn't always semantic/definition-based. To be clear, I would even say *most* connections are heavily semantic; to me, my inability to think up a Spoonerism on my own is an example of this -- I couldn't turn off (for lack of a more elegant phrase) the semantic connections in order to find one that's based on something else.
I think we're talking from two very different aspects of the field, and therefore probably at cross-purposes. If we understood the "jargon" of each other's niche, it's entirely possible we would be closer in agreement than we seem to be now.
Finally, I responded because I don't think that Griffiths et al. are arguing that meaning is actually represented this way, based on their work as a whole. For an extremely far-flung but pithy example, Love, Medin, & Gureckis 2004 dismissed several of their collaborators for being more more interested in describing a computational model of categorization, rather than algorithmic or implementational one.

Ceserina, We are talking past each other and I think that we would agree on many issues pregnant in this discussion (especially given your heads up to the Love, et al. paper), but ...

1) Causal connections are not meaningful in Fodor & Pylyshyn's world nor in mine. They imply and rely on meaning, but they don't have meaning.

2) The Spoonerism stuff, at first glance might seem like it is way off topic, but I don't think fundamentally it is, because Spoonerisms (like Google's PageRank) are a really really cool way of bringing up some really profound epistemological issues - like the relationship between words, concepts and definitions. Give me a second to state the epistemological issue:

A definition applies to a concept, not a word. A word is a name given to a concept, it is not the concept itself.

So, Spoonerisms, because they are a play on words match the sound patterns of the intended words, and because of this the meaningfulness of the Spoonerism is contained in the intended phrase. This means that the words of the Spoonerism point to the intended concept.

To me it is just COOL that Spoonerisms can bring up such profound topics.

3) I really, really, really like: Love, B.C., Medin, D.L., and Gureckis, T.M. (2004) SUSTAIN: A Network Model of Category Learning. Psychological Review, 11, 309-332

By john dennis (not verified) on 14 Jan 2008 #permalink

It seems to me that one has to step up one hierarchical level in order to get the full picture here. The Google search engine links, as existents, have meaning because the pages to which they link have meaning. This much like the brain, perhaps: a substrate of distributed connectivity upon which units of meaning (or proto-meaning) are situated. Those links have meaning, on the other hand, as subjects (the subject, that is to say, of some of the commentary attached to this page) but as mere existents (onta) they can be rendered meaningless by belonging to a network that connects to no pages, no content. This is supported by the fact that links not connected to content can not result in search engine results regardless of how many are interconnected with each other.

It does not seem unreasonable to me to see the experiment as a whole as a larger model for the memory operations of the brain. The Google PageRank algorithms provide information but no meaning. They search out the most "important" higher-level symbols (words) corresponding to a lower-level symbol (a letter). They don't know symbols from squat. All they know is a pattern of occurence at the level of "meaning," a level inhabited by collections of symbols. The brains of the participants do much the same with the same lower level-symbol.

We compare the human letter-word (i.e. semantic) performance with the PageRank semantic performance. Then we compare the results to other semantic models that have been proven to have a relatively high degree of predictive success only to find that, in the present experiment, PageRank outperforms them. (A result open to quite a wide range of interpretations, by the way, given the limited data.) Including that stage, then, we have: 1) higher-level symbolic constructs (web pages/participants in an experiment); 2) lower-level symbolic constructs (the memory trigger for both PageRank and participants) which must come from somewhere in both the experiment and memory at large in the world; 3) an agent to receive and assess the results and act upon and/or assess and learn from them. In both PageRank and the experiment, at present, #3 must be supplied by a still higher hierarchical level available, at this level of complexity, only via the human brain as the full data-set of this experiment strongly argues.