Technology has made substantial progress in preserving the spark of life in ever more severely damaged bodies. Preserving the quality of life of the beneficiaries of our advancing medical skills has become an increasingly difficult problem.
Neanderthal skeletons bear evidence of medical treatment and long term attempts to accommodate the adverse effects of crippling physical injuries. But they were largely limited to caring for those with partial disabilities like deformed and/or atrophied limbs. In the primitive world where simple infections tend to be fatal, those who suffer paralyzing injuries tend to die from their injuries before concern for the ongoing quality of their lives can became much of an issue.
Our ability to use technology to mitigate the adverse effects of physical injury has improved over time, allowing us to at least partly compensate for ever more profound degrees of impairment. While the part human/part machine cyborgs so common in today's popular media remain the domain of science fiction, even our current level of technology is starting to offer meaningful improvements for those trapped in profoundly damaged bodies.
I recently acquired an unexpected education in voice controlled technology as a byproduct of participating in a project to provide a quadriplegic with a measure of control over his environment.
Ironically, it was the consumer appliance industry's efforts to cater to the voluntary inertia of couch potatoes that made it possible to partly overcome the involuntary limitations of our paralyzed user. More and more appliances are being equipped with infrared remote controls these days. But while conventional remote control devices save the user the effort of walking across the room to change the channel on the TV, or adjust the volume on the stereo, they still require working fingers to push the buttons. A major part of our project exploited the fact that it's the infrared signals that control the appliance, not the arbitrary device that is sending those signals.
One part of our project was a Quartet Environmental Control Unit. The ECU provides several switches, a telephone, and an infrared transmitter that can be manipulated with recorded voice patterns. It's been preprogrammed with sets of typical commands used by a variety of household appliances. This allows the ECU to control pretty much anything that can be wired up to an external momentary contact switch, or responds to infrared signals.
The user must first train the machine to associate a unique sound pattern with each command. The sound pattern doesn't have to be the actual word - it can be whatever unique sound the user is capable of vocalizing. Whatever analog sound it captures will be reduced to digital data bits stored in a database, and one data bit is much the same as another to a computer.
The ECU's telephone functions and switches only require voice training. The infrared functions are a bit more complicated since every appliance uses different coded signals to avoid conflicts with other infrared controls. The ECU learns the infrared signals for each appliance from the remote control device that came with it. During training, each remote control is aimed at the infrared receiver on the front of the ECU, and its various buttons are pushed as the ECU steps through its list of commands. The ECU captures the control signals broadcast by the remote control device, and can then mimic the various signals with its own infrared transmitter when it hears a voice pattern it recognizes as a command.
A box of voice controlled switches and telephone functions can do a lot to extend the options of someone whose world would otherwise be limited to small head movements, and croaking out a few labored words with the air pumped into his lungs by the respirator - waiting for the respirator's next exhaust cycle to finish his sentence.
The ECU uses digital technology to provide some control over a few analog aspects of life. We live in an increasingly digital world, especially in regards to communications. The user's options can be greatly expanded when similar developments in computer technology are spliced into the project.
Voice recognition has long been one of the holy grails of software development – and has gained a reputation for being about as attainable. Voice recognition software is getting better, but is still largely subject to the same limitations it has always suffered - the more users the fewer words the system can recognize. The wider the variety of users it must accommodate, the narrower the range of vocabulary it will be able to recognize. The larger the system's vocabulary, the more its operation must be adapted to the individual user.
The digital operators used by telephone companies can decipher an impressive variety of regional dialects and accents, but their vocabularies are limited to just a handful of words and numbers. We wanted to provide our user with the largest possible vocabulary and thereby the maximum capability and flexibility. As a result, our project had to be extensively trained by the intended user.
The training started with the user reading specific words into a microphone so that the system could capture voice prints of how he pronounced them. The initial training only made the system marginally useful - it still got nearly as many words wrong as right. But it learns from its mistakes, and improves its accuracy over time.
The system tries to define a set of arbitrarily unique artifacts in the digitized sound patterns associated with each known word. The more times it hears a word, the more refined its identifiers for that word will become. While training allows the system to adapt to unique ways of speaking, it then requires the user to be consistent in those unique speech patterns. The more consistent the user can be in speaking, the more accurate the system will be at guessing what he said.
Once the system can accurately understand the user's commands, it can interact with the computer's conventional graphic interface as his proxy. Moving the mouse across the screen and clicking on a button with voice commands is less convenient than manipulating the mouse directly with functioning arms and fingers. But inconvenient is still a substantial improve-ment/advantage over unable. Improving productivity only becomes a concern after productivity becomes a possibility.
A voice controlled computer can become a substantial "force multiplier" - especially when connected to the Internet. Once it has the user's data, a word processor or accounting program doesn't care whether that data was entered at blazing speed by a professional touch-typist, or laboriously dictated by a user who has lost the use of his fingers. Email and web browsers allow a broken bed-ridden body to project a virtual presence that is indistinguishable from the legions of other slow typists that inhabit the globally connected community.
Much of the difficulty in the current approach is the necessity of adapting to the existing - two eyes and two hands - user interface, rather than starting from scratch with a user interface designed for the purpose. The market for voice operated computers is still too small to support the development of systems designed from the ground up to be voice operated. But as medical advances make it possible to preserve the spark of life in ever more severely injured patients, developing technologies that can respond to ever more limited means of communication will become an ever greater humanitarian challenge.