Google's Alfred Spector on voice search, hybrid intelligence and beyond

Alfred Spector
Alfred Spector is Vice President of Research and Special Initiatives at Google

Google has always been tight-lipped about products that haven't launched yet. It's no secret, however, that thanks to the company's bottom-up culture, its engineers are working on tons of new projects at the same time.

Following the mantra of 'release early, release often', the speed at which the search engine giant is churning out tools is staggering. At the heart of it all is Alfred Spector, Google's Vice President of Research and Special Initiatives.

One of the areas Google is making significant advances in is voice search. Spector is astounded by how rapidly it's come along.

The Google Mobile App features 'search by voice' capabilities that are available for the iPhone, BlackBerry, Windows Mobile and Android. All versions understand English (including US, UK, Australian and Indian-English accents) but the latest addition, for Nokia S60 phones, even introduces Mandarin speech recognition, which – because of its many different accents and tonal characteristics – posed a huge engineering challenge.

It's the most spoken language in the world, but as it isn't exactly keyboard-friendly, voice search could become immensely popular in China.

Technology challenge

"Voice is one of these grand technology challenges in computer science," Spector explains. "Can a computer understand the human voice? It's been worked on for many decades and what we've realised over the last couple of years is that search, particularly on handheld devices, is amenable to voice as an import mechanism.

"It's very valuable to be able to use voice. All of us know that no matter how good the keyboard, it's tricky to type exactly the right thing into a searchbar, while holding your backpack and everything else."

To get a computer to take account of your voice is no mean feat, of course. "One idea is to take all of the voices that the system hears over time into one huge pan-human voice model. So, on the one hand we have a voice that's higher and with an English accent, and on the other hand my voice, which is deeper and with an American accent. They both go into one model, or it just becomes personalised to the individual; voice scientists are a little unclear as to which is the best approach."

Machine translation

The research department is also making progress in machine translation. Google Translate already features 51 languages, including Swahili and Yiddish. The latest version introduces instant, real-time translation, phonetic input and text-to-speech support (in English).

"We're able to go from any language to any of the others, and there are 51 times 50, so 2,550 possibilities," Spector explains.

"We're focusing on increasing the number of languages because we'd like to handle even those languages where there's not an enormous volume of usage. It will make the web far more valuable to more people if they can access the English-or Chinese language web, for example.

"But we also continue to focus on quality because almost always the translations are valuable but imperfect. Sometimes it comes from training our translation system over more raw data, so we have, say, EU documents in English and French and can compare them and learn rules for translation. The other approach is to bring more knowledge into translation.

"For example, we're using more syntactic knowledge today and doing automated parsing with language. It's been a grand challenge of the field since the late 1950s. Now it's finally achieved mass usage."

The team, led by scientist Franz Josef Och, has been collecting data for more than 100 languages, and the Google Translator Toolkit, which makes use of the 'wisdom of the crowds', now even supports 345 languages, many of which are minority languages.

The editor enables users to translate text, correct the automatic translation and publish it. Spector thinks that this approach is the future. As computers become even faster, handling more and more data – a lot of it in the cloud – machines learn from users and thus become smarter. He calls this concept 'hybrid intelligence'.

"It's very difficult to solve these technological problems without human input," he says. "It's hard to create a robot that's as clever, smart and knowledgeable of the world as we humans are. But it's not as tough to build a computational system like Google, which extends what we do greatly and gradually learns something about the world from us, but that requires our interpretation to make it really successful.

"We need to get computers and people communicating in both directions, so the computer learns from the human and makes the human more effective."