At a demo last week, I was reminded how far Voice Interfaces have to go to be genuinely useful.
Quite apart from wondering whether voice interfaces are a solution looking for a problem, it made me think about my own experience with these technologies since I've been involved in a number of Telecom projects using Voice Interfaces.
The first one was VoiceDial - a network hosted voice recognition service that you could use to dial numbers by either saying the number or by saving a name. About 12 years ago when phones couldn't do this type of stuff. Had a great ad with a guy picking up his mother in law from the airport and having to dial her by his tag for her- 'The Old Trout'. Can't find it anywhere on YouTube. Let me know if you can find it!
Next one was a Voice Portal called WordUp- basically a Voice recognition system that allowed you to access audio content (primarily news, sport and weather, as well as reading your email to you).
Both of these had major performance issues under noisy conditions (pretty normal for mobile) which meant that the best you ever did was about 70% success rates. Pretty poor.
Having said that the primary customers who were interested were the Blind and there should have been a way to keep this service on - but I digress.
More recently we've had a voice interface as part of 123. The main aim was to direct customers to the most appropriate help as quickly as possible. This has recently gone through a dramatic simplification as again it wasn't able to achieve this - it pretty much just annoyed customers. Instead of asking you heaps of questions now, it asks one or two. Performance around complex answers was just not good enough for customers.
The demo I saw was for an integration between Microsoft Exchange and a softswitch. It produces some great benefits for customers in terms of managing fixed and mobile calling. One of the not so great benefits is the ability to manage your email using a voice interface. About 80% success rate from a fixed line phone! Fortunately the marketing person responsible for the product made the wisest statement I have heard in a demo like this for some time - "If it doesn't work, don't launch it".
So it seems that in over a decade not that much tangible progress has been made - from 70% success rate on Mobile to 80% success rate on fixed. You'd never tolerate that type of performance from a GUI so why do people think that it is tolerable on a mobile?
Until its possible to deal with all the variation in the human voice in an intelligent way this is one technology that I would stay well clear of for mass market use.
Have you seen it genuinely work in a mass market environment?