Developing Intelligence

It’s been said that psychology is a primitive discipline – stuck in the equivalent of pre-Newtonian physics. Supposedly we haven’t discovered the basic principles underlying cognition, and are instead engaged in a kind of stamp collecting: arguing about probabilities that various pseudo-regularities are real, without having any overarching theory.

Some of this criticism is deserved, and some of it applies more widely to the life sciences. Perhaps the only underlying principle in biology is “why use just one solution when using many works just as well?” (Natural selection is no exception to the rule – see biased gene conversion). This thwarts any Newton-esque induction of underlying principles in the life sciences, and holds double for cognition, an emergent phenomenon of myriad interacting complex biological systems.

Nonetheless, the naysayers may have a look at Brown, Neath & Chater’s 2007 Psychological Review article, “A Temporal Ratio Model of Memory” (freely available here). Brown et al. propose a theory they call SIMPLE – for “scale-independent memory, perception and learning” – and it’s likely to organize a number of stamp collections.

The basic principles are these:

1) Any complete model of memory will have to account for phenomena that occur across different timescales, in the same way that the law of gravity holds across many (although not all) spatial scales. Accordingly, SIMPLE uses a single mechanism to explain phenomena previously attributed to distinct short- and long-term forms of memory.

2) SIMPLE posits that memory follows the same law as that of other forms of discrimination, including something as simple as distinguishing an item from others in terms of its weight (or line length, or loudness). This law is that of Weberian compression, more widely known as the Weber-Fechner law, first proposed in the 1800′s.

3) The particulars of the model assume that memory can be understood as a multidimensional space; one crucial dimension is the temporal distance between encoding of the item and the time of retrieval. Of the many ways in which retrieval can be made problematic, one is the ratio of the temporal difference between the to-be-retrieved item and other items. This emphasis on ratio is the centerpiece of the Weber-Fechner law.

With these basic tenets, SIMPLE can capture a number of phenomena. To detail just one example, people are better able to recall items at the beginning or end of a list (known as primacy and recency effects, respectively) than those in the middle. This is true regardless of whether they’re attempting to learn a sequence, retrieve items in order, retrieve them in any order, remember various locations, or merely recognize items as familiar (as long as rehearsal is discouraged). This effect holds not only across tasks, but also across timescales: items in the middle of a particular list are remembered worse, as are those items that are encountered in the middle of the entire experimental session. In particular, the magnitude of recency effects reflects the log ratio between the duration between the to-be-remembered items at encoding and the retention interval. In other words, it is temporally scale-free. Other aspects of memory are scale-free as well, including the proportion of errors at each position relative to overall performance, and the tendency for the order of nearby items to be confused (which holds from miliseconds to weeks).

This scale independence is taken to reflect Weberian compression, which can be understood via a telephone pole analogy: discrimination of items in memory may compress with increasing time in the same way that evenly-spaced telephone poles appear more compressed the farther they recede into the distance. This can be formalized by plotting the distance of each item (in seconds) from the time of retrieval, taking the logarithm of each distance, and taking the ratio of the logs of the nearest items. If this ratio is higher than some value, the item should be correctly retrieved.

This same principle holds in a very different domain – that of stimulus identification. Brown et al describe a typical experiment in which subjects might be exposed to items differing in length, weight, magnitude, or duration, each with distinct labels, and then must identify the label associated with each item when exposed to those items in a random order. These tasks also reveal serial position effects, such that it’s hardest to identify items that exist in the middle of the distribution (whether in terms of amplitude, weight, length, area, semantic value, spatial position, brightness, numerosity, or temporal duration). While the recency and primacy effects are symmetric when the distribution of items is gaussian along the relevant dimension, they become asymmetric when those items occur with a positively skewed distribution – as would occur with Weberian compression of temporal duration.

The authors formalize these notions with a series of equations that capture a few basic points: first, the discriminability of an item is equivalent to the inverse of its summed similarity to all potentially-retrievable items, where similarity is the ratio of temporal intervals (as described earlier). This formulation has an interesting conceptual similarity to entropy, for example as used here; both formalisms capture competition among responses by summing across a function relating the cue or time of retrieval to each potential response. (There are probably some deeper mathematical equivalences here, but I unfortunately don’t have the math skills to solve for them – if anyone does, I’ve included the equations at the bottom of this post.)

The implementation of SIMPLE includes only three parameters. One governs the temporal distinctiveness between memory representations, another governs the threshold of temporal distinctiveness below which items are not retrieved (omission errors), and the third governs noise in that threshold. Using just these three parameters, SIMPLE generally captures around 80-90% of the variance in tasks as diverse as immediate and delayed free recall, proactive interference, power law forgetting curves, order reconstruction, and serial recall of both grouped and ungrouped items. It’s worth noting that these free parameters are fit to the data from these experiments, and converge to different values for different paradigms; it’s not the case that SIMPLE actually learns how to do these tasks and emergently produces these data as a product of learning, much less from the same parameter values.

A similar observation has been made in the newest issue of Cognitive Science (hat tip to the phenomenal BPS research digest blog) by Shiffrin, Lee, Kim & Wagenmakers (PDF freely available here). These authors note that SIMPLE is a great candidate for evaluation with Bayesian statistics, and their analyses confirm that SIMPLE can produce fantastic fits to serial position effects.

However, their Bayesian analysis of SIMPLE quickly reveals peculiarities in the 3 free parameters used in fitting SIMPLE to each of the six modeled serial position effect data sets. These parameters sets are largely nonoverlapping in 3-d space, suggesting that single parameter settings (as would presumably occur in an individual subject) would not provide good fits to the data. More worryingly, some of the parameter fits are systematically correlated (specifically, threshold and threshold noise appear to trade off), suggesting, in Shiffrin et al’s words, that the parameters are nonindependent and therefore not theoretically compelling.

As an aside, Shiffrin et al extend SIMPLE by fitting the threshold noise and temporal distinctiveness parameters to a constant value across data sets, and set the threshold value as some proportion of the list length in each data set. To my eye, the extension fits the data quite well, although Shiffrin et al emphasize their extension was merely to demonstrate how Bayesian methods can illustrate inadequacies in computational models and help highlight their predictions.

Despite the shortcomings mentioned above and by Shiffrin et al, SIMPLE is an impressive step towards a universal law of human performance in diverse tasks: only a few free parameters in a Weber-Fechner like equation are necessary for capturing a number of detailed psychological phenomena, and at least two of those free parameters seem somewhat redundant. It will be important for future research to underpin this apparent psychological law with those mechanisms that create a Weberian logarithmic compression of neural representations. That, to me, would be one mark of a true shift towards a “post-newtonian” era of psychology.

Equations Referred To Above:

Brown et al’s formula, where P(R) is the probability of recalling an item (i.e., it’s discriminability), T sub i is the temporal distance of an item at the point of recall, T sub j is the distance of any other item from the point of recall, and c is a free parameter governing distinctiveness.

i-28f8d48bf27fa2fc1ebba9899bd1d306-BrownetalDiscrimination.jpg

LSA entropy, where where p(i) is the cosine between the stimulus and each alternative response divided by the sum of LSA cosines across alternative responses.

i-54d12f0cc56359eb0394bab8a1dbee86-LSAEntropy.jpg

Gordon D. A. Brown, Ian Neath, Nick Chater (2007). A temporal ratio model of memory. Psychological Review, 114 (3), 539-576 DOI: 10.1037/0033-295X.114.3.539

Richard Shiffrin, Michael Lee, Woojae Kim, Eric-Jan Wagenmakers (2008). A Survey of Model Evaluation Approaches With a Tutorial on Hierarchical Bayesian Methods Cognitive Science: A Multidisciplinary Journal, 32 (8), 1248-1284 DOI: 10.1080/03640210802414826

Comments

  1. #1 Vixey Fahren
    February 29, 2012

    I’m currently writing a paper on this…such a difficult topic to get your head around, even harder to explain! Good job on this though! :^)

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.