Google Gemini gets us closer to the AI of our imagination, and it's going to change everything
Tell me what you see
It was always going to be Google, wasn't it? The search giant with the most data and perhaps the best foundational AI team just asserted itself as, if not the current AI market leader, certainly the most innovative in the space: Google Gemini looks like a generational leap that might very quickly get us closer to what feels like Artificial General Intelligence.
I'm not talking about Gemini Nano, which is now flowing into Google Pixel 8 phones, or even Gemini Pro, now found running in Bard (and which told me today, "If you value the qualities I offer and find me to be a positive force in your life, then I consider myself your friend."). I'm talking about Gemini Ultra, which I believe is behind the startling video Google produced to demonstrate Gemini's stunning multimodal capabilities.
If you haven't watched that video yet, it's worth your time. It's a fast-paced and seemingly casual walk through a variety of tasks that feel less like assignments than an exploration with an inquisitive and intelligent friend. The video is described as a series of challenges, showing Gemini images and "asking it to reason about what it sees."
I find it interesting that Google is willing to use the word "reason" when it comes to Gemini. Reasoning is something we assume only humans can do, because it's more than just a binary interpretation of the facts. It takes into account context and maybe fuzzier factors that computers might ignore or fail to understand at all.
In the video, developers draw a picture that Gemini quickly identifies as not just a bird, but a duck. It comments on the color being wrong, and when presented with a real rubber duck, Gemini surmises that it can float because "it's squeaking." There's an enthusiasm in its response, and an almost causal nature. But the intelligence is obvious, especially when Gemini effortlessly creates a game with nothing more than a map ('Guess the country').
Like a person, it can see something familiar and quickly make a logical leap, like when it is presented with a crumpled piece of paper and three cups. As soon as one cup covered the paper, Gemini knew "You're trying to get me to find the paper ball under the cup. I accept the challenge." And, yes, it knew where the paper was.
Gemini has no trouble identifying non-obvious relationships between objects like a fidget spinner and citrus, which could both be calming (the scent in the case of citrus).
Gemini is creative – in the video, it turned balls of yarn into images of cute knitted fruit and animals. It can be collaborative – when the researchers drew instruments, Gemini created music based on the sounds each instrument would make, both individually and collaboratively.
And like people, Gemini can be wrong. When shown a video of a cat leaping from a counter towards a shelf, it surmised the cat would make it ("It's going to be a purrfect 10!" Gemini can be funny, too). When the cat failed Gemini expressed surprise, but also confidence that the cat would be okay.
In all cases, Gemini reasoned with only minimal facts and input. The prompts didn't spell out the details, and Gemini was left to figure things out on its own. In most aspects, you could assume that there's a brilliant friend on the other side of interactions.
The power of data
The video illustrates a not-too-distant future where AI is a true companion and not just an answer bot. OpenAI and its partner Microsoft are heading in this direction, too, but they have yet to show us a multimodel experience that puts it all together in quite the same fashion. ChatGPT is still text-based. DALL-E is still an image-generation platform. Microsoft CoPilot (which is based on GPT-4) is inside Windows 11 and Office 365, but still feels like more of a worker bee than a friend.
I'm not saying that Google's vision of AI friendship and casual give and take is the best possible future, but it is the one we're heading for.
There is a caveat in Google's rather astounding video. Sequences were shortened throughout. That means it may have taken Gemini longer to reason out an answer and complete some of the tasks than was shown; but speed is something Google will easily solve.
What matters here is that Google is finally showing what you can do when you have the world's information and industry-leading AI development. Microsoft and OpenAI have the best-known AI, but they've never had access to the same kind of data and knowledge graph as Google. I always assumed that was an advantage; and now, it seems, Google has finally figured that out, too.
If I had to guess which AI would change the world, my money is now on Gemini.
You might also like
- What is AI? Everything you need to know about Artificial Intelligence ...
- Google Gemini is its most powerful AI brain so far
- Best AI tools
- Pixel 8 Pro update makes it Google's first AI phone – here are the 6 ...
- Google's Instrument Playground offers a taste of an AI-generated ...
- Google Bard's biggest AI upgrade so far sees it close the gap on ...
Get daily insight, inspiration and deals in your inbox
Sign up for breaking news, reviews, opinion, top tech deals, and more.
A 38-year industry veteran and award-winning journalist, Lance has covered technology since PCs were the size of suitcases and “on line” meant “waiting.” He’s a former Lifewire Editor-in-Chief, Mashable Editor-in-Chief, and, before that, Editor in Chief of PCMag.com and Senior Vice President of Content for Ziff Davis, Inc. He also wrote a popular, weekly tech column for Medium called The Upgrade.
Lance Ulanoff makes frequent appearances on national, international, and local news programs including Live with Kelly and Mark, the Today Show, Good Morning America, CNBC, CNN, and the BBC.