Augmented reality is being hailed the next big thing on mobile devices, for everything from gaming to checking out house prices; point a smartphone running Yelp at a city street and the Monocle feature gives you reviews and ratings for the businesses there.
Google Goggles can do much the same thing but Goggles product manager Shailesh Nalawadi isn't that interested in AR: "We don't really see ourselves as working on augmented reality; we are working on visual search."
That's the way humans work, he points out, so it only makes sense for search to work visually. "80% of information is consumed visually, and not through any of the other senses. Why is it you have to translate and transcribe what you see into words?"
Although Nalawadi is realistic about what you can do with a phone camera today ("Image recognition is really hard although there's no shortage of Hollywood movies that show this as already achieved!"), he doesn't think we need location-aware spectacles or the digital contact lenses predicted to be on sale by 2020; as the name Goggle suggests, the phone is the viewer.
"You have these really fast computers that we all carry around in our pockets, that have capabilities you wouldn't have had in desktops a mere five or 10 years ago," he says.
"Of course, no matter how powerful these devices are, computer vision has a way of taking up all the CPU cycles you have so Goggles take care of the heavy lifting over in the cloud."
That only works because of mobile broadband connections and while Nalawadi calls the three to eight seconds recognition time that Goggles usually takes "pretty phenomenal" he admits "we also realise it is not enough because people's attention spans are really short."
Far from finished
Goggles is the fruit of three to five years of research and it's far from finished, he says.
"We struggled and we built this thing where you can move your camera, point it at an object and have it come back and tell you what it is that it's looking at. The reality is we're really far from that state."
What Nalawadi wants to do is much more ambitious than just layering information about where you are on screen; he wants visual search that can deal with the whole world.
"Primarily, he explains, "it's about extending the recognition capabilities of our computers. Right now we have a very narrow set of tens of millions of objects that we recognise but the world is much larger than tens of millions of objects and it's a phenomenal effort to try and get this info into our database and recognise it."
SEE AND SEARCH: Goggles treats landmarks like logos and barcodes; it tells you what you're looking at and brings up search results
Over time Goggles will recognise plants and chess games, and soon it will translate text on things you see.
Making the database of images Goggles can match bigger is only the start. There's the basic search problem: "We are spending a lot of time on search quality; when there is a successful match, what are the relevant results that need to come back?"
A broader problem
But Nalawadi also wants to tackle the much broader problem. "Currently there is this notion that augmented reality is all about the display of curated geodata. I think there's way more information in that scene and you really need image recognition overlaid on top of this to give more information of what is going on around you."
He also thinks image recognition will make the augmented reality experience better than the approximate position today's smartphones can calculate.