What It's Like To Be A Bat: Seeing With Sound Via Sensory Substitution

In his famous essay, Thomas Nagel suggested that science's reductionist methods can never provide a complete understanding of the "subjective qualities" of consciousness. To illustrate this problem, he wrote that there was "no reason to suppose that" we would ever be able to comprehend what it's like to be a bat - because we can't truly understand the subjective experience of, for example, echolocation.

Ironically, scientific advances in "sensory substitution" technology have demonstrated that it's possible to simulate (or stimulate) one modality (sight, hearing, touch) with sensory data from another. In one such system, a camera translates optical information into weak electrical pulses, which are then applied to the tongue (which is an ideal interface for sensory substitution due to its high sensitivity and large representation in the cortex). Users of this technology report the subjective experience of actually seeing with their tongue, if you can imagine that (note that Thomas Nagel would suggest you can't).

New research by Auvray et al demonstrates how a different sensory substitution system - "the vOICe" - provides people with the ability to see through sound waves, without the limitations inherent to electrical stimulation of the tongue (which includes irritation, pain, and large energy requirements).

Auditory sensory substitution systems like vOICe have several additional advantages:

1) the auditory system is exquisitely and simultaneously sensitive to multiple dimensions of sound (frequency, amplitude, harmony, rhythm, spectral, left/right onset time and to a lesser extent left/right phase differences)

2) audition seems to have natural relationships with spatial processing. For example, human subjects are faster to respond to an upper location if a high-pitched sound is played simultaneously, and are conversely faster to respond to a lower location if a low-pitched sound is played, as opposed to the opposite mapping.

3) the temporal resolution of sonic information can be easily increased or decreased, which can be used to improve its correspondence with the increased resolution of human vision at the center of the field of vision. In contrast, electrode grids are currently subject to spatial limitations due to the need to reduce interference among electrodes.

There's clear appeal to the use of such a system (both as a prosthesis for the blind, as well as the potential of providing soldiers with 360 degree vision), but many unanswered questions remain. For example, it's still not clear exactly how these systems interface with the brain, and which visual pathways do they utilize?

The human brain appears to have developed two distinct visual routes - one for more semantic or "object recognition" purposes (the ventral pathway) and another for more spatial or action-oriented purposes (the dorsal pathway).

To investigate whether visual sensory substitution with audition uses one or both of these visual streams, Auvrey et al. trained 6 sighted on the use of the vOICe system, which works by sampling once per second the 32 degrees of the visual field in greyscale with 174 x 64 resolution, and converting it into frequencies spanning 500 to 5000 Hz. The loudness of sounds corresponds to a pixel's brightness, and the frequency of sounds corresponds to a pixel's vertical position. The visual field is scanned in columns, with the frequency distribution at any given moment in time representing a single column of visual pixels. So, to provide an example, two parallel lines each running horizontally would sound like two sine waves of different frequencies superimposed on one another for a period of time, whereas a single dot would sound more like a "beep."

Cameras were wielded by the blindfolded subjects as they were trained for 3 hours on how to localize a visual target in a high contrast environment, and track that target as it moved. In a first experiment, subjects were asked to point at a black target 1 meter away, to approach it and then to touch the target.

Amazingly, subjects were always able to correctly point at the target, but depth perception seemed to be a particularly hard skill: when the rather small targets (11x11 cm) were "far" away (60 cm) subjects took significantly longer to point at it. However, by the end of the 2 hour session, subjects had nearly halved their reaction times on the task, suggesting that significant learning was still taking place.

A second experiment showed that subjects were still able to localize even a smaller target (a 4cm sphere) as displaced in depth from them, but that more errors were made the farther away this object was placed. Unlike the previous experiment, performance across this 2 hour session did not improve.

A third experiment demonstrated that after 1 hour training with 10 different objects (e.g., a plant, a shoe, a book, etc), subjects were generally able to correctly recognize objects after 1 minute of exposure, and that performance improved over the course of the 3 hour testing session. As in the first experiment, subjects eventually halved their recognition reaction time. Interestingly, one of the subjects was a musician, and was able to recognize objects significantly faster than the others. Again, this suggests that the benefits of training were not fully realized with the short training periods provided here..

A fourth experiment showed that subjects were also able to perform subordinate judgements on each category of object - that is after 1 hour of training, subjects could discriminate between two different exemplars of 7 different object categories (e.g., two examples of shoes, two examples of plants, two handbags, etc) at a level significantly above chance.

This research demonstrates a "proof of concept" for general-purpose visual sensory substitution with audition. Even with relatively small amounts of training, subjects were able to use sound to locate and identify objects by their visual characteristics in 3-dimensions. In most cases, training had remarkable benefits, and the learning curves may be far from linear: proficiency with this system could conceivably increase exponentially with additional practice.

On the other hand, there's substantial room for improvement: the reaction times were so long as to make this system practically useless for most applications, and there's a long way to go until vOICe is ready for real-world distances and lighting conditions. The sophistication of vOICe's auditory encoding needs to increase in almost every possible way before this will be ready for the real-world. But the proof is there - its clearly possible to substitute one sensory modality with information from another, even with a relatively crude apparatus and minimal training.

Did these subjects learn what it's like to be a bat? Almost certainly not, but they may be nonetheless closer to such an understanding than many predicted would ever be possible - including Thomas Nagel. As this and other neurotechnology becomes more developed, it will no doubt continue to inform our understandings of consciousness, embodiment, sensory immersion, and the subjective nature of experience.

Categories

More like this