Augmented reality is being hailed the next big thing on mobile devices, for everything from gaming to checking out house prices; point a smartphone running Yelp at a city street and the Monocle feature gives you reviews and ratings for the businesses there.
Google Goggles can do much the same thing but Goggles product manager Shailesh Nalawadi isn't that interested in AR: "We don't really see ourselves as working on augmented reality; we are working on visual search."
That's the way humans work, he points out, so it only makes sense for search to work visually. "80% of information is consumed visually, and not through any of the other senses. Why is it you have to translate and transcribe what you see into words?"
Although Nalawadi is realistic about what you can do with a phone camera today ("Image recognition is really hard although there's no shortage of Hollywood movies that show this as already achieved!"), he doesn't think we need location-aware spectacles or the digital contact lenses predicted to be on sale by 2020; as the name Goggle suggests, the phone is the viewer.
"You have these really fast computers that we all carry around in our pockets, that have capabilities you wouldn't have had in desktops a mere five or 10 years ago," he says.
"Of course, no matter how powerful these devices are, computer vision has a way of taking up all the CPU cycles you have so Goggles take care of the heavy lifting over in the cloud."
That only works because of mobile broadband connections and while Nalawadi calls the three to eight seconds recognition time that Goggles usually takes "pretty phenomenal" he admits "we also realise it is not enough because people's attention spans are really short."
Far from finished
Goggles is the fruit of three to five years of research and it's far from finished, he says.
"We struggled and we built this thing where you can move your camera, point it at an object and have it come back and tell you what it is that it's looking at. The reality is we're really far from that state."
What Nalawadi wants to do is much more ambitious than just layering information about where you are on screen; he wants visual search that can deal with the whole world.
"Primarily, he explains, "it's about extending the recognition capabilities of our computers. Right now we have a very narrow set of tens of millions of objects that we recognise but the world is much larger than tens of millions of objects and it's a phenomenal effort to try and get this info into our database and recognise it."
SEE AND SEARCH: Goggles treats landmarks like logos and barcodes; it tells you what you're looking at and brings up search results
Over time Goggles will recognise plants and chess games, and soon it will translate text on things you see.
Making the database of images Goggles can match bigger is only the start. There's the basic search problem: "We are spending a lot of time on search quality; when there is a successful match, what are the relevant results that need to come back?"
A broader problem
But Nalawadi also wants to tackle the much broader problem. "Currently there is this notion that augmented reality is all about the display of curated geodata. I think there's way more information in that scene and you really need image recognition overlaid on top of this to give more information of what is going on around you."
He also thinks image recognition will make the augmented reality experience better than the approximate position today's smartphones can calculate.
"To solve augmented reality, you need to have an extremely accurate location. The current method of extraction from handsets are really insufficient and in my view it leads to a poor user experience where you're looking in this direction and the app thinks you're looking over there - we really we need to crack that nut before augmented reality apps becomes interesting.
"We think computer vision is the solution to that; we can use computer vision techniques to supplement that data that's coming from the handset."
That kind of recognition would get away from the need for everywhere you go to have been mapped and annotated for augmented reality in advance; instead of telling you what is supposed to be where it thinks you are, a future version of Goggles would tell you what you're actually looking at.
Sometimes that could be too much information. Goggles doesn't have face recognition because of privacy concerns. And there are already calls for an open augmented reality standard, based on openARML (Open Augmented Reality Markup Language) - a standard way to describe points of interest, based on Google's KML.
"The viewfinder [on your phone] is the new browser", says Mike Liebhold from the Institute for the Future; "it's a view of data through the viewfinder. If it's a browser it should follow browser rules; it should be able to render the data independent of the client."
Opening up Goggles
Nalawadi promises that third-party apps will be able to build on Goggles; "Goggles is not just an app - it's a platform. Yes, we do plan to open up the platform as an API but we are not sure what the platform should be.
"I'm interested in understanding from developers what are the features and capabilities of Goggles that would be good to expose."
And that's when what you could achieve looking at the world through Google Goggles could really change what you see: "What are the interesting apps you can come up with?" Nalawadi asks. "What user experience can you create when you have access to computer vision?"
Liked this? Then check out 10 ways Google Goggles will change the world
Sign up for TechRadar's free Weird Week in Tech newsletter
Get the oddest tech stories of the week, plus the most popular news and reviews delivered straight to your inbox. Sign up at http://www.techradar.com/register