Is Voice the Future of Interaction?

6 Nov, 2011 | TechTdp

The launch of Siri with the iPhone 4S has brought voice control back into the limelight. There has been voice recognition software for as long as I can remember, I played with some, probably back in the 90s, and it was never very good. Having used Google's Voice Search and seen Siri in action, natural language processing has obviously come on.

I've also seen voice control in a car first-hand recently (in a BMW, not my car) and was very impressed with how it saves the driver from taking their eyes off the road (largely) or hands off the steering wheel. It got me thinking that, as the price is obviously starting to drop, we may be on the tipping point for voice control being rolled out to a lot more devices in the next few years.

Now with the likes of Siri and Google's Voice Search the issue is that the actual processing, the heavy lifting, is done by servers, not on the local machine. That approach is fine in a phone, or a tablet, or a PC, but it won't work for the majority of devices. What I don't know is how much of that needs to be done by the server.

Siri relies heavily on outside help, not just to find the data you request, but research from Artificial Intelligence Center and the voice recognition engine is provide by Nuance. Nuance also happens to provide voice recognition for Ford, BMW, Audi, Fiat, Mercedes, Renault and many other car manufacturers. The new Mondeo has voice control as standard. Now these don't do natural language processing, they simply have a list of thousands of commands and work against those to determine what the driver is requesting.

I assume they don't need as much processing power or a connection to the outside world as the number of responses is limited. Mobile phones typically run off low-powered and (relatively) cheap ARM chips, so the option for companies to include these in more devices is certainly there (if they're not using them to run their devices already) for little cost. Imagine being able to switch out your light switches and simply say 'lights on' when you walk into a room.

That works for some devices and not necessarily for others. It's rumoured Apple may be launching a line of TVs which use Siri to allow voice control. It's a nice idea, but TVs provide noise pollution which may interfere with voice commands (I assume they could block out the sounds produced by the TV, but echoes, etc could be a problem, not to mention people tend not to watch the TV in silence, besides, what stops your kids from demanding one channel when you want to watch another? She who has the remote rules the viewing schedule, currently, but not with voice commands). Likewise I don't think we'll want to have things like computers or devices in a public space being voice activated (the latter for privacy reasons if nothing else), though God knows thing like train ticket machines would be considerably easier to use if you could just tell them what you wanted.

So we're probably looking at applications limited to inside a building or vehicle, but just imagine what the possibilities are, programming your PVR, oven or washing machine suddenly becomes much easier. You could change the temperature on your thermostat without getting up. And that's before you consider the applications for the disabled, elderly or infirm. What about places like hospitals where manually pressing buttons opens up the transfer of germs?

I suppose we're going to need some standards though, you don't want to have to learn a new vocabulary each time you buy a new device ("I must remember to say 'temperature set: 20 degrees,' not 'set temperature: 20 Celsius"). How would you turn a lamp off without turning the main light off, or the lamp on one side of a room and not the other? But these are all things that can be overcome, or programmed around.

I'm not saying this is going to happen overnight, but like my predictions for smartphones and tablets, we're probably only looking at the next 2-5 years before voice becomes commonplace.