Visual images reconstructed from brain activity

Recent advances in functional neuroimaging have enabled researchers to predict perceptual experiences with a high degree of accuracy. For example, it is possible to determine whether a subject is looking at a face or some other category of visual stimulus, such as a house. This is possible because we know that specific regions of the brain respond selectively to one type of stimulus but not another.

These studies have however been limited to small numbers of visual stimuli in specified categories, because they are based on prior knowledge of the neural activity associated with the conscious perception of each stimulus. For example, we know that the fusiform face area responds selectively to faces, and so we can predict that a subject is looking at a face if that area is active, or some other visual stimulus if it is not.

Researchers from the ATR Computational Neuroscience Laboratories in Kyoto, Japan have now made a significant advance in the use of fMRI to decode subjective experiences. They  report a new approach which uses decoded activity from the visual cortex to accurately reconstruct viewed images which have not been previously experienced. The findings are published in the journal Neuron.

Yoichi Miyawaki and his colleagues exploited the functional properties of the visual system for their method. Specifically, they utilized a feature called retinotopy, whereby the spatial relationships between components of an image are retained as visual information passes through the visual system. Adjacent parts of an image are encoded by neighbouring neurons in the retina, and the topography remains in place when the information reaches the primary and secondary visual cortical areas (areas V1 and V2, respectively). Here, the so-called "simple" cells of the visual cortex encode the simplest components of the image, such as contrast, bars and edges.

Thereafter, the visual information is processed in a hierarchical manner through higher order visual cortical areas (V3, V4 and so on). Thus the "raw" data relating to the simple image components is combined; more features are added at each successive processing step, and the same information is encoded at increasingly larger scales. Thus, the initially crude representations of an image become more refined with each point in the hierarchy, until eventually an accurate reconstruction of the visual scene emerges into consciousness. 

The researchers used functional magnetic resonance (fMRI) imaging to analyze the activity of the neurons involved in the earliest stages of visual processing, whilst their participants viewed a series of around 400 simple visual images, including geometric shapes and letters, during a single "training" condition. They then presented to the participants a series of completely new images, and combined the decoded fMRI signals from neurons in V1 and V2 with those from V3 and V4, all of which contain neurons that encode image contrast at multiple scales. By analyzing this activity using a specially developed algorithm, they were able to accurately predict the patterns of contrast in the novel images observed by the participants. 

The major advance over similar neuroimaging studies carried out in the past is the ability to accurately reconstruct images that the particpants had not previously seen. This was possible because the activity recorded was that of neurons involved in the earliest stages of visual processing. These cells encode a small number of features, so their activity is limited to a small number of different states, and can be decoded with relative ease. Their combined activity can therefore encode a huge number of combinations of the same simple features, and so could be analyzed to predict and reconstruct the novel images, from a set of millions of candidate images. 

As the film clip above shows, the reconstructed images are accurate but not too detailed - they consist of 10 x 10 patched reconstructions of the viewed images. However, as the algorithms and devices used for neuroimaging become more sophisticated, and as our knowledge of how the brain processes visual information advances, the ability to reconstruct images in this way will improve, and the reconstructed images will become more detailed.

The authors note that their new approach could be extended to reconstruct images that include other features such as colour, texture and motion. A similar approach could possibly be used to predict motor function from brain activity, and so could eventually lead to significant improvements in the capacity of neural prostheses and brain-computer interfaces. They even suggest  that the method may one day be used to reconstruct hallucinations and dreams, which are not elicited by external stimuli, but which are also associated with activity in the visual cortex. Even if this was realized, it still would not constitute "mind-reading", because reconstructing visual images from brain activity is one thing, but deciphering the activity underlying a complex stream of consciousness is another. 


Miyawaki, Y. et al (2008). Visual Image Reconstruction from Human Brain Activity using a Combination of Multiscale Local Image Decoders. Neuron 60: 915-929. DOI: 10.1016/j.neuron.2008.11.004

More like this

This is probably one of the most amazing things I've ever seen.

That makes me giddy!
I was particularly struck by the inverse image appearing after the presented one.... Am I correct in interpreting those as afterimages?

It is not "mind reading" yet but inferring mental states from neural activity is gaining a momentum. I recalled the 80% accuracy of James Haxby et al in interpreting if a subject is viewing a face, an object or a place and more recently Haynes et al decoding intnetions form brain patterns.

Philosophically the problem is of "other minds" and the eternal quest of humans to know what others are thinking and know science is tackling the problem

Extrapolating enormously, I wonder if this could eventually be used to shore up eye-witness testimony and/or identify suspects, perhaps as a supplement to, or replacement of, forensic artists' renderings based on witness interviews.

This study was linked from Slashdot yesterday, but it's much more at home in here in Mo's territory. The potential practical applications will be interesting to track forward as the techniques progress.

This is an impressive technological achievement, but it does not really tell us anything new about the visual system - it is already well established that early visual cortices map the retina - and I very much doubt that it could ever have the sorts of "practical," "mind reading" applications that some commenters here seem to be suggesting.

I can't access the actual paper at the moment, but it seems pretty clear that the results must have been obtained under condition of eye fixation. It is important to remember that in normal vision the eyes are in almost constant, and irregular, motion. The fovea, the part of the retina with the greatest visual acuity, where the light sensitive cells are packed most closely together, and where most of the color sensitive cone cells are found, only receives light from about 2° of visual angle, and fairly major saccades, to foveate new parts of the scene in front of us, occur on average about 3 or 4 times every second. This means that under naturalistic conditions that "image" in the visual cortex would constantly be jumping wildly about. Furthermore, even during the fixations between these saccades, other lower-amplitude sorts of eye movements continue: microsaccades, drifts, and a constant tremor at about 90Hz. The microsaccades, like the larger saccades that take us from one fixation to another, are irregular but non-random, and appear to be affected both by what the person is looking at and by what interests them in the scene. That is to say, they are almost certainly under cognitive control, and are functioning in a purposeful way to extract information from the visual scene. Thus, our visual experience and the information we actually obtain through vision depends crucially on eye movements. There is also significant signal processing that occurs in the retina itself, before any information from it even reaches the brain.

It is a grave mistake to think that seeing is analogous to taking a series of static snapshots that get passed back to the brain and somehow synthesized into visual experience (note that we do not experience the visual world as jumping about constantly, as our eyes move), and I am rather afraid that experiments like the one reported here, which, by studying the eye brain system under special unnatural show the eye working almost as if it were a digital camera, actually have the potential to mislead people about how vision really works. The problem is not that the results are wrong, or even that they do not tell us something we ought to know about the visual system (or, at least, they would if we did not know it already), but, especially in the case of a spectacular result like this one, they direct our attention to certain (already well known) aspects of the visual system, and away from other less well understood aspects that might actually be far more important for understanding how seeing really works.

Martinez-Conde S. Macknik S.L. & Hubel D.H. (2004). The Role of Fixational Eye Movements in Visual Perception. Nature Reviews: Neuroscience, 5, 229-240.

Ãlveczky B.P. Baccus S.A. & Meister M. (2003). Segregation of object and background motion in the retina. Nature, 423, 401-408.

Richardson D.C. & Spivey M.J. (2004). Eye Tracking. In Gary E. Wnek & Gary L. Bowlin (Eds.), Encyclopedia of biomaterials and biomedical engineering (pp. 568-582). New York : Marcel Dekker.

peter @2: I noticed the same thing! It's gotta be. For a moment after stimulation, during the neurons' refactory period, a negative image seems to be appearing. You can also notice up on how the fovea saccades across the image, as parts of it fade in and out. You see that less on the smaller ones (and not at all on the small box, which you can focus on in its entirety). Neat stuff!


It's true that this doesn't tell us much about the visual system. I think, however, the value of this study is realized in terms of what it says about the capabilities of fMRI. Imaging has been getting a lot of bad press recently after the Logothetis paper in Nature this summer. While its true that there are still a lot of studies with really shitty experimental design that get through because they have pretty pictures, this paper suggests that maybe the technology itself is more reliable than we though. I was impressed.

Being an animation student, I can see huge benefits for what this technology might mean in the future. Maybe not even in the next 50 years, but if things don't go too wrong, I hope many movies will be made simply by imagining. Currently even a single frame(and there are at least 12 frames per seconds usually) might take a lot of time to complete, but this tech may enable us to "film" things close to the speed of imagination one day. Which would save a lot of time obviously. :)

Until then millions like me will work their asses off like idiots...but that's always the way it goes, isn't it... :)


Well, I did say at the outset that this technically impressive, and my original post was not targeting fMRI per se.

There are problems, though, with fMRI (and brain imaging methods in general) quite apart from any issues of "shitty experimental design" that may or may not arise in some cases. There are those, such as Coltheart (see his articles in Cortex in 2006), who argue that, by its nature, fMRI cannot possibly tell us anything useful about how cognition works. It cannot either falsify or verify any conceivable cognitive theory. That may be a bit strong, but even so, all those nice little pictures of brain areas lighting up all to easily capture the imagination, and in some cases (this experiment being a prime example), almost beg to be misinterpreted as lending support to - or even outright "proving" - simplistic theories of cognitive function that we know (on the basis of other solid but less flashy findings, that are often more difficult to explain and grasp) are false. Indeed, this would probably not be a particularly interesting finding, worthy of being blogged about, if it were not for the fact that, on its face, it seems to amount to a "proof" (or at least a very vivid demonstration) of the naive "snapshot," "digital camera" theory of vision. That theory is already well entrenched in "folk" visual theory, and this experiment is only likely to reinforce that. Indeed, I see that it is already going on with some of the other comments posted here, and people are already extrapolating the implications from a bad theory of vision to a worse theory of imagination (even though the experiment in question is not about imagination, or memory, at all). Given that these results (sound as they may be) actually add nothing to our understanding of the physiology of vision, it looks to me as though this demonstration may well turn out to do more harm than good to the advancement of real scientific understanding.

Popular legend notwithstanding, Galileo did not really refute Aristotelian mechanics by dropping weights from the leaning tower of Pisa. His actual refutation was far more ingenious and conceptually complex, and his actual relevant experiments (with inclined planes) were much more rigorous, but also much less spectacular. What if someone actually had done the leaning tower experiment, however, using, say a ball of iron and a ball of crumpled paper of comparable size? The paper ball would in fact (because of air resistance) have fallen more slowly, and Aristotle would have seemed to be vindicated. Galileo could have explained why of course, and why this result was actually misleading, but his explanation would have been relatively complicated, and who amongst the rejoicing Aristotelians would have been listening?

Any possibility of using this sort of process to drive music "visualization", particularly for non-musicians?