Why it's time to reconsider speech recognition

Microsoft Sync 3.0 in autos

The latest version of Microsoft Sync 3.0 reveals an interesting trend in speech technology – one that will integrate easily with other services. Upcoming Ford vehicles in the US will feature this new technology which is far more complex than simple voice commands.

The new service supports news split into categories such as entertainment and sports; in-vehicle directions that actually work well with the GPS features of the car (other voice-controlled GPS systems such as those from TeleNav are not quite as fluid); quick access to weather information that is activated by a voice command; and detailed traffic info.

MS sync

Each of these speech options are easy to activate: for sports scores, you can just say the name of the sport. For business information, you can perform a search for a business name and then say 'connect me' to dial that business.

There's a website – syncmyride.com – where you can configure quick access options for your favourite sports teams, choose options for traffic (such as whether you receive a text message on your mobile phone and on your Sync system in the car), and set up your favourite options.

The Sync 3.0 system would be a lot of technology without a lot of usability if it did not have a 'user interface' that worked well. In a hands-on test at Microsoft in the Seattle area, we saw how the system walks you through the options. You first press a Voice button on the steering wheel. Then you say the word 'services' to access all of the extra traffic news, turn-by-turn directions and weather information.

It's intuitive because it walks you through all of the options in a way that is not distracting. Sync 3.0 uses a new voice called Stephanie that has a soothing speech pattern that never seems overly intrusive. The Sync service works where a real person speaks all of the interface options. Whereas some GPS voice systems seem static and even harsh, the Sync 3.0 service shows how computer-based speech can integrate into other technologies.

Another interesting benefit to using Sync 3.0 is that the system usually repeats back everything you say, which helps to improve accuracy when you search for a term like 'banking' instead of 'bakery'.

Sync 3.0 shows how speech technology can evolve and become a ubiquitous part of your day, not just a program you run when you need it. And what's next for mobile tech in your vehicle?The next version of Sync could support even more robust features such as dictating a document or holding an instant messaging 'conversation' where what you say is converted to text.

It's also not far-fetched to imagine another future scenario: since newer vehicles from Lexus, BMW, and Mercedes support new technologies such as self-parking, automatic lane changes, headlight dimming when another vehicle passes, and a cruise control system that senses other vehicles and speeds up and slows down automatically, it is possible that some or all of these features could be voice controlled.

For example, pulling up to a busy shopping mall, a future version of Sync could support a command such as 'find parking spot' – at that point, GPS and on-board sensors could locate a free spot and guide you to the open spot.

TellMe for Windows Mobile 6.5

In the US, the 1-800-555-TELL service is a brilliant example of how speech technology is evolving. FedEx is also using a similar service developed by TellMe. The speech synthesis is extremely accurate, based on millions of entries in language dictionaries and computer-based interpreters that understand diction, accents, speaking speed, etc.

TellMe

The services are also remarkably adept at rooting out background noise and focusing on the spoken commands. The same TellMe technology is now coming to all Windows Mobile 6.5 devices. The service supports voice-to-text messaging, so you can speak your mail and have the system send an email.

There are options for business and web searches, weather, movie, and traffic info, sports and stocks. Like the Sync 3.0 service, TellMe on a mobile is also easy to use – you can send a text message by saying the word 'text' and the name of your contact, then dictating the message and saying the word 'send' to send the message.

TellMe runs an entire data centre just to process these voice commands for its 1-800-555-TELL service, so the new mobile service will be very accurate.

Speech recognition on netbooks and UMPC devices

While most netbooks and UMPC (Ultra Mobile PC) devices use truncated versions of Windows XP or Windows Vista, the main issue for speech recognition is that the processing speed is not quite fast enough to interpret what you say accurately.

In testing a Lenovo S10 netbook, for example, the Intel Atom processor was just not fast enough to run Dragon NaturallySpeaking 10 and had a hard time keeping up with spoken words. The speech recognition runs, but the faster you talk, the slower the dictation records what you say.

We also tried speech on the Fujitsu LifeBook U820 and found accuracy was also poor, due to the slow Atom processor and the fact that it uses the older Windows XP OS.

The solution to this problem will likely be the same technology that Microsoft, a company called Loquendo, and others are using, that processes speech on a remote server instead of on the device itself.

As you can imagine, that kind of speech technology will require extremely fast broadband connections. In the UK, that may not be as big an issue as you might think, as trials are underway for citywide WiMax roll-outs, fibre optic delivery at 10-20 Mbps to the home and office, and emerging standards such as WiGig that could eventually replace Wi-Fi in laptops.

This Cloud computing model means the device itself is not nearly as important as the connection. A netbook, mobile phone or UMPC can serve quite well as an input device for speech and the back-end servers – likely a remote data centre – can process the speech accurately and immediately provide feedback, record your text and interpret commands.

There is one factor that could stall this speech revolution on ultra-compact devices: concerns over security. Currently, most speech technology is designed for simple voice commands. Once businesses start using speech on a netbook to 'speak' financial information and transmit it to the corporate office, encryption becomes a primary concern – and there is little encryption available over high-speed connections, especially over a 3G connection.

This roadblock can be resolved, but mobile users may continue to be wary of the technology.

-------------------------------------------------------------------------------------------------------

First published in What Laptop Issue 126

Liked this? Then check out 10 stress-busting tips for managing Outlook email

Sign up for TechRadar's free Weird Week in Tech newsletter
Get the oddest tech stories of the week, plus the most popular news and reviews delivered straight to your inbox. Sign up at http://www.techradar.com/register

Follow TechRadar on Twitter

John Brandon
Contributor

John Brandon has covered gadgets and cars for the past 12 years having published over 12,000 articles and tested nearly 8,000 products. He's nothing if not prolific. Before starting his writing career, he led an Information Design practice at a large consumer electronics retailer in the US. His hobbies include deep sea exploration, complaining about the weather, and engineering a vast multiverse conspiracy.