Artificial networks see illusions, too

You've seen this illusion before, right?

i-10073d79f82a1ec39743252611889959-illusiongraph.gif

The "grid" defining the light gray squares on the left side of this figure seems to get lighter where the lines intersect. The graph on the right shows that the actual reflectance (or brightness when depicted on a computer screen) of the figure does not change along the path marked by the blue line. But perceived brightness (indicated in red on the graph) does change.

But what's really interesting about this graph is that the thing doing the perceiving isn't a human. It's an artificial neural network. Auntie Em has the details:

The brain in question was an artificial neural network (ANN) that only ever existed inside a computer. It was trained to successfully perform on a lightness constancy task. Most excitingly, when trained to discern between overlapping layers, the ANN sees White's illusion (Box E). White's illusion has been problematic to model as the lightness perception goes "the other way" from the stimuli shown here. Thus, the by-product of learning to see lightness and depth is a susceptibility to these illusions. This also tells us something about how animal brains, including our own, work.

Here's that panel showing White's illusion:

i-26b916c8227d64858e3f7c3ae87e6c5c-illusiongraph2.gif

Fascinating, isn't it? You can read the entire study, free, on PLoS.

Tags

More like this

Any theory of vision or a good model should "fall" for visual illusions "automatically", without especially being designed for it. So this is good. On the other hand, artificial networks may do their thing well, but their capability is embodied in myriads of coupling constant so it is difficult to impossible to "understand" _why_ they are working.

Michael, you point to the difficulty of understanding the way in which high level functionality emerges from the low level operations of an artificial neural network as if it were an undesirable property which we should strive to remove. The operating characteristics of the brain are also embodied in myriads of coupling constants. It seems unfair to expect models of the brain to be qualitatively less nonlinear than the brain itself. Analytically tractable models certainly have their place, but the appearance of such an illusion in a more straightforward model would probably not be as surprising, and there is little reason to believe that a model with an organization fundamentally different from that of the brain would be able to capture the basis of the brain's computational capabilities.

Dear mean3monkey, you may very well be right. It would markedly alter the meaning of "understanding" the brain for me, though. And I would be disappointed, just as I am by current chess programs -- they beat human champions, but they do it by brute force, not by "understanding" the game. The concept of chunking, for instance, is alien to these programs.

But why do you cite "linearity" as the dividing line between analytical models and ANNs? And why should not "high level functionality emerge[s] from the low level operations" with an analytical approach -- which of course would have to be non-linear (as most models are), otherwise it'd just conserve information.

BTW: that paper on low ongoing rate of cortical neurons & metabolism, which you allude to in your home blog, could be: Lennie P (2003) "The cost of cortical computation", Curr Biol 13:493-497. â¨

Anyway, as to understanding vision I may have to let go of a dear habit; thanks you for the nudge.

Michael,

I mention linearity because one can predict the behavior of a linear system for arbitrarily long periods of time without actually simulating every step. I would call such a predictable system analytically tractable. That is, I'm equating prediction with analysis. In contrast, nonlinear dynamical systems are, in general, chaotic, and thus beyond prediction in the long run. Even traditional computer programs are fundamentally unpredictable: the halting probably in incomputable. The only way to tell what a random computer program will do is to simulate it. The brain is turing complete (ignoring the limitations on its memory); indeed, I can myself simulate a turing machine (with the aid of some paper). Consequently, my behavior cannot be predicted in general, and I would claim I am beyond analysis.

I'm afraid I don't know what you mean by "understand." I would claim that we will have understood the brain when we can simulate its operation. The simulation is the understanding. In this sense, we can certainly understand an artificial neural network with arbitrary coupling constants, even if its operation doesn't seem particularly intuitive. In contrast, while we can currently simulate the operation of individual neurons, we cannot accurately simulate even a single cortical column, since we know so little about the statistics of neuronal interconnections.

I would argue that, by definition, computer programs that win in chess understand the game. Their understanding underlies their ability to play well. There's no little homunculus sitting inside your head "understanding" chess while you play. You're just executing some algorithm. Admittedly, a very complicated algorithm, but an algorithm nonetheless.

I suppose what I'm really getting at is that I don't think there's any reason to be disappointed by (initially) non-intuitive algorithms. Our introspection into the operation of our minds is notoriously inaccurate. Cognitive science is a useful tool, but the metaphors we create for the lower-level operations of the brain need not have direct physical instantiations. Simply being able to describe what's going on in the brain (at a systems level, rather than a cellular or cognitive level) would mean that we have our hands on the thing itself. The intuitive understanding would come with time. When you first look at a complicated proof or algorithm, it often seems nonintuitive. You can follow each step, but it is unclear how the whole emerges from the parts. With continued contemplation, everything starts to fit together and make intuitive sense.

"There's no little homunculus sitting inside your head "understanding" chess while you play. You're just executing some algorithm. Admittedly, a very complicated algorithm, but an algorithm nonetheless."

Wait, it's impossible that a human "chess-playing algorithm" could invoke introspection? Sorry if that's too intentionally obtuse to serve as sarcasm, but your explanation of how the brain works seems to boil down to behaviorism, which gets points for being old-school, but perhaps loses points for its ability to predict human behavior.

I don't want to rag on the paper, but it seems like the extension from the network to human psychology was probably not supposed to be a real focus. For example, this claim is so far off, it's laughable:

"These data suggest that "illusions" arise in humans because (i) natural stimuli are ambiguous, and (ii) this ambiguity is resolved empirically by encoding the statistical relationship between images and scenes in past visual experience."

Well, that's probably partly true. Ambiguity forces our visual system to make assumptions, and making assumptions is something that our brains are (generally) very good at. But that's not why their specific illusions work. Instead, nearly every illusion they used stems from VERY well-understood principles of center-surround receptive fields and lateral-inhibition in the early visual system. To imply that past visual experience plays a fundamental role indicates that there is a big disconnect between the fields of computational biology and visual cognition, and that's a shame. They're like two bashful teens at a party, who want to chat but can't muster up the courage to say 'hi!'

If looking up and working out the center-surround stuff doesn't sound like anyone's cup of tea, suffice to say that your eyes are essentially contrast-detectors and not luminance-detectors. Surrounding equiluminant patches with dark vs. light will produce a brighter / darker percept, respectively, and that's almost entirely the result of the different signals your eyes and LGN have produced based on relative contrast in the original image. I agree that it's cool to find similar behavior in an artificial network, but we already pretty much know why these illusions happen, and it ain't an experience thing.

Their extension, that we should expect to see these illusions in every visual system that deals with ambiguity, is a great example of a prediction that will bear out for the wrong reasons. We'll see these illusions in any visual system that takes, as input, photoreceptor activity that is modulated by local contrast.

I suspect you'd find a lot of those.

Hi,
I'm the co-author of this paper, so first I'd like to thank Dave for linking to it. It's nice to be noticed!

Brian's point that we see lightness illusions because of lateral inhibitions and centre-surround fields is true, and we never dispute that. But that is the immediate / proximal cause - what is the distal cause? I.e. why do we have those physiological features in the first place? Or specifically, why do we have brains wired up in such a way that we see these illusions? I think the best explanation is from the statistics of past experiences, including those of our evolutionary ancestors. We see these illusions (and everything else) because of these past experiences: the relevant statistics of those experiences are captured in the human visual system as lateral inhibitory connections, centre-surround physiology etc. If evolution had taken a different route, we might have a very different physiology, but it would still capture the same statistics, and we would still see these lightness illusions - or at least, that's our prediction.

As for understanding how and why these networks see the illusions, that's a good question and one we hope to address in future work. Do they capture the relevant statistics in centre-surround fields? What statistics are relevant? A useful approach might be to consider the functional mapping that the networks carry out, where inputs (scenes) that are physically different but yield similar responses get transformed internally to have more similar representations, while physically similar & behaviourally different scenes are transformed to have different internal representations. But that work is at an early stage.

Thanks for your interest.

1. @mean3monkey "Understanding", yes, what is that? I used quotation marks on purpose around the word, because a lot of baggage comes with that term, and we clearly differ in our interpretation. As I said, brute force approach to chess does not imply "understanding" to me, see the "chunking" I alluded to. Reminds of the Chinese room example.

2. @Brian: "... nearly every illusion they used stems from VERY well-understood principles of center-surround receptive fields and lateral-inhibition ..."I beg to differ here, first the Hermann grid illusion: it has recently been convincingly shown that the lateral inhibition thing is, at least, incomplete (demo here, sorry to amplify myself).For an opposite example, see the the Munker-White Illusion. This is not based on lateral inhibition (the effect would be opposite), but is found by Corney & Lotto, a major success of their model IMHO, even if the "understanding" is as yet incomplete.