Hacking Vision?

An interesting idea from Mark Changizi from RPI: can one design pictures which, when interpreted by your vision, perform a computation? Press release here (note to RPI public relations department: you should probably make it so that the webpage address of your press releases can be copied from the browser address bar. Somewhere a web designer should be shot.) and paper in Perception published here.

The basic idea is to use the orientation information we glean from looking at objects to perform computations. Thus for example, Changizi suggest that we can represent zeros and ones via the two different orientations seen in this picture:


i-d0f27e843523e471d9692edec5ec1045-zeroone.png
I See Zero and One. Taken from Changizi M, 2008, "Harnessing vision for computation" Perception 37(7) 1131 - 1134


Okay so far so good. I definitely see a zero and a one. Now the idea is that by putting elements like this together one can then have the part of your vision system which computes these orientations perform a computation. Cool idea, no? But, try as I might, I just can't see how the gadgets described in the article work. For instance, here is the proposed NOT gate, which should flip the orientation of the input blocks:

I do not see a NOT. Taken from Changizi M, 2008, "Harnessing vision for computation" Perception 37(7) 1131 - 1134

So, is my visual system just messed up an not able to perform this computation? Do other see the computation? And if they can, then does this mean that I'm doomed to forever be not performing computations by just looking, whereas there may exist people who can do a whole eight bit adder just by looking?

This also makes me wonder whether there are any similar concepts in other senses: perhaps in sound? (Which leads naturally to: you may think you are listening to the latest song from Band of Horses but really you're calculating the thirtieth digit of Pi)

More like this

Imagine the bit falling into the right box, rotate it around the cone and when it comes out in the left box, it's inverted. The problem with the rotation is you have to imagine it rotating in the direction of the light grey side. So for the 0 bit it rotates around in front of the cone, and for the 1 bit it rotates around in the back of the cone.

I do see it, but I note that I see it for NOT(0) (the right-hand figure) a bit more easily than for NOT(1) (the left-hand figure); the latter reverses on me too easily (My perception of Necker Cubes tends to switch fairly easily and rapidly between the two alternatives).

Shouldn't this one be "hacking double vision?"

By JohnQPublic (not verified) on 25 Jul 2008 #permalink

Hmm. I didn't read the paper, and I admit this is a bit outside my area, so I may be missing something here.

After staring at those figures, I thought it might be something that should be looked at more from an artist's point of view and not try to pack too much into the illustrations conceptually. Because in a way it sounds like all this is is an attempt to create highly intuitive illustrations of building block thought processes.

The second pair of figures in particular suffers from the kind of science-itis that drives designers nuts. (Don't get me wrong, designers are riddled with their own weird faults.) Perspective in itself is just an optical trick, but here the image is confounded by the ambiguity Kevin C. noted as the Necker cube -- a wire frame with missing or faulty depth cues. It's putting you in the position of having to do so much conscious work to resolve the intended positions and purpose of the objects depicted, that it defeats the whole purpose of intuitive coding.

I can't help thinking that there's some mumbo-jumbo here purposely hiding the fact that the authors were just too cheap to work with designers on this project. Clear illustration, like clear writing is just so... mundane. Doing useful stuff with mind bending illusions is a cool idea, I'm just not feelin' it here.

By Radge Havers (not verified) on 25 Jul 2008 #permalink

I can see how it's supposed to work, but it just seems too clumsy. I think that the cone is somehow supposed to "guide" the flip, but it's too easy for me to see the whole image and detect the same orientation as the input. Also, my brain keeps wanting to go Escher on me with that hinge construction.

By John Moeller (not verified) on 25 Jul 2008 #permalink

Does this even have to be in 3D? The article mentions that transparency plays a role for the OR-gate.

Changizi says "our visual system (the hardware) would automatically and effortlessly generate a perception, which would inform us of the output of the computation." I don't see how this would be effortless. Computing the outputs of primitive elements is easy in any scheme. The hard part is keeping the all current outputs in one's head.

Koray: >>>"Does this even have to be in 3D?...The hard part is keeping the all current outputs in one's head."<<<

I think you're right, and anyway the printed page, and the way people view it, offers limited throughput for a "mind hack." I suppose you could maybe force some kind of computation on the brain with flashing lights and animation, but I'm guessing you'd at least want some way to monitor and direct the output as it's being generated. Imagine an unchecked buffer overflow... ouch.

All art is a just mind hack anyway -- even the best is just an elaborate illusion. In concert with the way perception generally works, artists will design their work so that the completed image first makes an instant and comprehensive statement and then maybe invites closer inspection. I've noticed technically oriented people sometimes take an inverse approach, especially with things like maps, and try to pack as much detail in as possible into an image. The result is something more suited to reading than viewing -- with the eye free wander the page and gather whatever "words" (symbols, juxtapositions) are deemed necessary to make a thought at the time, ignoring the rest.

Imagine trying to plow through the nasty flow chart of optical illusions that would contain all the steps necessary to complete a useful job. And then there is, as you say the problem of simultaneously maintaining the outputs.

By Radge Havers (not verified) on 27 Jul 2008 #permalink

All I can tell is that NOT(1) is messed up. The geometry is all weird. Maybe that is what makes it a 1?

The 0 has clean geometry. Maybe thats what makes it clear/transducive.

Does NOT seeing a computation = seeing a NOT computation?!