Secrets from Google's labs

As for improving existing language pairs, the main challenge has been in understanding idioms and emerging language use (an ongoing battle), but also breaking down sentence structures using artificial intelligence. For example, in English, a verb might come at the end of a sentence, whereas in Japanese it might come at the beginning. When languages have similar historic roots, such as French and Spanish, the language pairing is easier.

With Japanese and Korean, for example, the project goal is to accumulate more and more data about the language which is more difficult to translate. The more data it has about a language's morphology, the easier it can translate it into other languages.

Developing new language pairs – such as English to Finnish – is even more difficult. The most difficult languages to pair are those that have a vast morphology (the units of language and how they fit together to form into meanings). The Finnish language in particular has a rich morphology where one word used with another can form into an expression that has inherent meaning, such as race or gender, which the individual words do not have.

These word pairs are complex and the more processing power you throw at the problem, the more accurate the results. Google operates two translation engines, one for public use that is faster but less accurate, and one internal (and experimental) engine that runs slower and is more accurate. The internal project runs on faster server farms, has richer data sets and uses better algorithms. "Putting more data into the system makes language translation better," says Franz Och, a Google research scientist.

It's interesting to note that machine translation is less about human knowledge of a language and more about data collection. Few members of the Google translation team can actually speak the languages they are translating, but they are very good at collecting the morphological data. In the end, translation is a major test of data collection and software programming prowess, and will continue to evolve – making it easier for users to both learn and use a language in their daily lives.

Computer vision search

Computer vision is one of the most difficult problems in computer science. The idea is to have a computer analyse an image and recognise it through artificial intelligence.

The implications are profound: if a computer recognises images, it can process them more accurately. Think of a bank account. If a computer could analyse a live video of you and verify your identity, your account would be much more secure. At Google, computer vision is less about security and more about indexing the image data. Today, when you search for 'Lindsay Lohan', the results presented are based on metatag data attributed to photos of the starlet. Some of those attributions are wrong, which is why sometimes results are returned that are inaccurate.

Computer vision, conversely, analyses the space between the eyes, nose shape, forehead width and other data, and compares them to a reference image. This analysis applies equally to video and photos and it's much more accurate. In a demo at Google, Shumeet Baluja – a Google research scientist – showed how a computer vision search for George Bush returned a series of videos of the US president's recent speeches.

One implication of computer vision search – once it evolves beyond a simple recognition phase – is that you could then categorise the results. Google is focused only on detection today, but its mission is all about categorisation. Computer vision will aid the company in building a library of searchable images and video beyond just text descriptions and metatags.

The first step in computer vision search is to analyse a database of millions of videos and images to see if there is a face. Baluja says most of its resources in computer vision are currently dedicated to just detection: is there a face? The next step is to perform the pattern matching against the reference image or video.