Secrets from Google's labs

Google has become a behemoth of innovation and a harbinger of intellectual capital.

Yet, despite its obviously self-written Google Finance summary that says it "maintains an index of websites and other online content," the company is actually a dual-purpose entity. It's a very successful experiment in social engineering (where people flock to its Internet properties) and a vast advertising network (where those same people see countless ads).

Google has tapped – like no other company – the power of several million or perhaps billion sites that serve its ad links, usually for free. It's an amazing concept: if you build powerful and useful tools and establish your company as an Internet oracle, you can attract millions of people to your advertising network and fuel even more innovation. If it fails at innovation, people will stop using its ad system and its revenue could start to fizzle out. We're here to tell you: that is not going to happen, because we have seen the future of Google in the form of its ongoing research projects.

Universal 'one-box' search

Universal search – or 'one box' search – has to do with how the company presents search results. In 2007, in a subtle yet important change, Google shifted from presenting just text links to more universal results that include photos, news, blog entries, video and even book excerpts.

David Bailey, a Google engineer, says they are experimenting with the algorithms for universal search. For example, if you use the term 'Martin Luther King' you may see more archival information, such as book excerpts and far fewer news reports. If you search for a movie star, you may see more news, YouTube videos or photos. The implication here is that Google is categorising through artificial intelligence: during the split second that the company analyses its database of web indexes, it's also analysing the term, figuring out how to present the UI so it's more focused on images, text or video. Yet, it's going deeper than that. For video, as an example, it's analysing the file size, codec, star ratings and other data to determine the best video results.

Google has also moved away from 'operators' such as 'movie:iron man' that dictate results. Power users can still use this search syntax, but Google automatically looks at your search term (for instance, 'Iron Man'), knows it's a recent movie and so presents showtimes and reviews. Universal search is also becoming more 'web 2.0' aware and crawls through details such as Yelp.com restaurant reviews or Zillow.com home prices to present more detailed results. "A big part of my job is to shine a spotlight into all these remote corners of the web," explains Bailey.

What is actually happening with universal search is that Google has an index of each category for photos, web, blog entries and so on. These indexes are database files almost continually updated by crawling the web and searching through millions of URLs. In a very real sense, the heart of the company – these indexes – depend on the processing power of the Google server farms that crawl the web. Bailey says that this confluence of categorisation requires more and more data centres, more electricity and more processing power as the web continues to expand. At the moment it's hard to see what the end result of universal search could be, because it will continue to evolve and become more intelligent.

Language translation

Most of Google's efforts in translation have revolved around 'language pairs' – the translation from one language to another. It has focused on two areas: developing new language pairs and improving the algorithms used for translating pairs.