Where next for speech recognition on the Mac?

Voice recognition needs to get smarter to become popular

Speech recognition on the Mac

Ten or so years ago we imagined the future would be all about holograms, virtual reality and voice control, but now, in 2011, we've not quite reached those lofty expectations.

While 3D TV is slowly filtering into the mass market and augmented reality has begun to replace the chunky headsets seen on 90s gameshows, voice control really hasn't made the mark we were expecting it to. So what is it about voice recognition that has left more of us typing than talking?

Voice recognition in a nutshell

In order to fully understand the ins and outs of voice recognition we need to look at its main uses, of which there are three distinct categories. The first is voice control; simple spoken commands that can do anything from check for new mail to switch between applications.

Voice control within Mac OS X is an assistive technology but can be used as a quick way to handle common tasks. The same technology is used for Voice Control in iOS to switch tracks, as well as by in-car stereos to control playback, phonecalls and SatNav.

Dictate the proceedings

Then there's dictation, which requires more impressive speech-recognition work. This is handled by apps from Nuance such as Dragon Dictate, which uses algorithms to learn your voice and understand what you say.

For these more advanced applications you will need a decent-quality microphone or headset and a profile will need to be created so that your unique voice patterns can be understood accurately. This also applies to apps such as Scribe from Mac Speech, which learns your voice from audio files and can transcribe audio notes you have made into text documents.

The final category has seen an increase in awareness and functionality with the rise of the iPhone and Android handsets. Apple recently acquired a company called Siri that specialises in voice search and Google already has voice search included as part of its Google apps.

Voice search, while not as technologically advanced as the dictation apps, picks out keywords from your requests and actions them based on its understanding, for example, searching for nearby restaurants. This category slightly overlaps with voice control, but with advances made by Google especially, it deserves its own category for its location-aware nature.

You may not know it, but your Mac actually has speech recognition technology installed by default. Try it out for yourself.


Head to System Preferences and click on the Speech button. From here, you can not only name your Mac in order to give it commands ("Computer, check my email" and so on) but you can also tell it to be constantly listening for your commands, so if you do need to switch apps and don't have a hand free, you can just say it out loud.

Amongst the many spoken commands a Mac will understand, you can even ask it to tell you a knock-knock joke. Just say "Tell me a joke", and your Mac will respond "Knock-knock", to which you must reply "Who's there?", and so on.

For more advanced tricks, head to the Command tab under Speech in the System Preferences pane and click the Open Speakable Items Folder. Here you will find scripts for individual actions and specific applications that you can edit and rename to suit you.

To create your own shortcuts, you can simply change the name of a script that already exists or duplicate a script and edit the contents using AppleScript. If you want to change what you need to say in order to invoke a shortcut, simply change the file name of the speakable item to anything you wish to use instead.

Applications that aren't already featured in the speakable items folder can be added, as well as shortcuts and voice commands included or made from scratch.

When used correctly, this speech recognition is a handy tool, but it's far too easy for it to mistake a command or mistake a conversation (or you talking to yourself) for a command. The outcome is a lot of repetition and accidental actions taking place.

There is an option available to turn a key on your keyboard into a kind of "push to talk" button, but using a finger to allow voice input kind of defeats the object of handsfree voice control. With a little tweaking and care, it's easy to control a number of applications and basic functions on your Mac without having to touch the mouse or keyboard, but it's certainly not perfect.

Talk to your phone

The iPhone and certain models of iPod also make use of speech recognition to change tracks, make calls and create playlists. By invoking Voice Control on the iPhone, a number of voice commands are available, much like speech recognition in Mac OS X.

Also like the Mac speech recognition software, voice control on the iPhone provides feedback to help ensure you select the correct command. As with desktop voice recognition, the iPhone's voice control can also be hit and miss, and you run a likely risk of calling the wrong person at the wrong time or playing obscure tracks from your iTunes library by accident.

With the new second microphone in the iPhone 4, audio clarity has been dramtically improved, leading to fewer mistakes, however it is still possible to make errors, especially when the headphones are plugged in.

Mac speech recognition software

For this section of the article, we thought it was only fair, while extolling the virtues of voice control and dictation, to attempt to write it using only our voice.

Dragon dictate

Making use of Dragon's Dictate software we are currently sitting in front of an iMac, looking pretty strange, speaking aloud as if to a secretary. In terms of accidents, the speech recognition in Dragon software is far more accurate, as it performs a series of tests and procedures that learn your voice and build a profile for specific uses. So, even if you have a particularly unusual voice, your dictation is surprisingly error-free.

The other benefit speech recognition offers is pace. While commands spoken to your Mac may take a few seconds to execute as the computer attempts to understand what you've said, Dictate can handle large sentences at a time.

The software provides a floating window that hovers over your currently running application and enables you to perform basic dictation as well as related tasks, such as saving files, sending email and more.

With word processing the difficulty arises in distinguishing between the words you want dictated and commands such as punctuation, therefore you have to be very careful when adding commas and full stops. As if to illustrate the point, that last sentence took a little longer than normal due to the app thinking we wanted a comma followed by the word "is" rather than the word "commas".

One of the most important things you will learn when using software such as Dictate is that you need to speak clearly but naturally, as if you were speaking to another human being. Tiny intonations in your voice and the raising and lowering of pitch give clues to the software as to what you're trying to say, especially when using words with more than one meaning.

Dictate can work with your Mac's built-in microphone or another microphone you may be using, however it's best to use the recommended hardware such as the Plantronics headset we were provided with.

Headsets with a push-to-talk or mute button are the most useful as they avoid accidental inputs if you happen to clear your throat or begin a conversation with a friend or co-worker.

The application is constantly working to learn your voice in order to provide a flawless experience, and you can return to the practice tests at any point to give it a clearer idea of the way you speak. You can also create profiles for different locations where there may be background noise, such as in an office or a coffee shop (although how many of us would want to be speaking out loud to a computer in a public place?).

Despite the comma issue (which Dictate seems to think should be "congress you") it's very easy to ramble on for hours and hours, while the application hastily notes down everything you say.

Nuance also provides a piece of software called Scribe, which does largely the same job as Dictate, except it works with audio files you have recorded previously using an iPhone or another recording device.


Again, this software has to learn your voice before it can accurately transcribe your audio file and can only do so when it has a profile created. Once complete, it's a simple process of importing your audio note, checking for errors and receiving the transcribed text.

The same applies to the Dragon Dictation app available for iPhone and iPod, which does a pretty good job of recognising your voice in real-time and saving it as text.

While Dragon is the best way we have found to control applications and accurately dictate, it doesn't provide the totally hands-free experience one might expect. While it's a great deal easier to walk around the room calmly speaking your thoughts while the computer does the work, there has to be a level of editing and adjustment before you save your final copy.

Once again, as if to illustrate the point, we just changed 'savior' to the correct 'save your' in the last line. We dictated more than 50% of this article, amounting to 1000 words or so, and found we only had to weed out a few common mistakes such as similar-sounding words, grammatical errors and missing capitalisation, but it was light work in comparison with many options we've tried before.

Get bossy

It seems that speech recognition isn't quite at the level one would expect at this stage in its development. The software understands what we are saying and can accurately transcribe those words, it can also perform basic commands based on voice input, but it's perhaps the software performing the actions rather than the engine transcribing the text that needs further development.

Rather than simply telling a computer to check for mail as you could do in the same amount of time with a mouse click, why can they not answer more complex questions such as "Do I have any important email?"

There would be more use in a method of using simple scripts along the lines of Google's priority inbox, which understands that when you say "important" you mean a specific set of contacts who may have emailed you.

contact list

The same is true of apps such as iCal, where currently scheduling meetings or events isn't as simple as one might think. What if you were able to say to your computer: "Set lunch with Dave tomorrow at two" and the computer understood your command, set the calendar date, emailed Dave and even went ahead and reserved a table at your favourite restaurant using an online booking form.

The technology exists, it's just about how it's applied. And here is where the crossover between desktop and mobile voice recognition is making the biggest difference.

Voice Search

Google search facilities get better and better with each update and now, via the iPhone and Android handsets, it can provide search results based on a spoken question, taking into account your location and preferences. This is as close to true voice control as we have ever been.

Siri performed a similar job on the iPhone then mysteriously disappeared from the App Store before the announcement was made that Apple had bought the company. Following its public spats with the search-engine giant, Apple is unlikely to continue using Google's search, maps and voice recognition tools, but sees the major benefits voice recognition offers mobile phone users, hence this acquisition.

Perhaps smartphones and the explosion of powerful GPS-enabled devices is exactly what the speech-recognition industry needs - an injection of awareness to bring it into the mass-market. As the world becomes increasingly mobile with iPads and iPhones taking on more of the daily burden traditionally consumed by laptops and netbooks, speech recognition is a much-needed tool, the popularity of which is likely to increase.

It won't be long before a synced phone mounted in a vehicle will respond to voice controls as standard and companies such as Ford, with its Voice Activated Sync, are leading the way. This control of devices through voice is not only convenient, but a serious safety measure to counteract the dangers of using a phone while driving.

Fancy talk or careless whispers?

With the many benefits of speech recognition, it seems strange that it hasn't quite taken off in the way some would have expected. But it appears that things are now beginning to change.

On the desktop, it seems that voice recognition is likely to remain limited to just dictation apps, however the mobile platform is where more exciting voice-recognition apps are beginning to emerge.

To control your computer with your voice isn't quite as natural as some might think, and without 100% accuracy leads to too many time-consuming errors. The fact is you always need to use a keyboard even if you can do the majority of tasks with just your voice, and therefore voice recognition will never truly rule as an input method.

As smartphones become more powerful and more like computers, they become the ideal tools for voice-recognition software. And when combined with a search engine such as Google's Voice Search, keyboards could almost become a thing of the past.

If it weren't for games, perhaps a manufacturer would have already attempted a completely voice-controlled device?

In a way, Apple already has, with its almost buttonless iPod shuffle. The latest shuffle still offers voice control, however buttons were reintroduced after a lack of interest from consumers in a solely voice-controlled product.

"People clearly missed the buttons," said Steve Jobs at the time. Perhaps none of us want to be limited in control options; perhaps we're a little too shy to tell our electronic devices what to do in public.

We certainly felt a silly during the writing of this feature as we babbled away into a microphone while others looked on quizzically. Ultimately, it comes down to adoption and a sense of 'normality' from technology.

Remember, handsfree calling was once a niche feature but is now widely accepted, even if users do appear to be talking to themselves. For voice control and speech recognition, the same is true. If telling a device what to do with your voice becomes the standard, more and more people will start giving their fingers a rest.

Article continues below