Google's Veo 3 marks the end of AI video's 'silent era'

A jellyfish, two llamas and a baby elephant

(Image credit: Google)

Google's video generation model got a major upgrade
Announced at Google I/O, Veo 3 can combine audio and video in its output
It's an Ultra and US-only feature for now

AI video generation tools such as Sora and Pika can create alarmingly realistic bits of video, and with enough effort, you can tie those clips together to create a short film. One thing they can't do, though, is simultaneously generate audio. Google's new Veo 3 model can, and that could be a game changer.

Announced on Tuesday at Google I/O 2025, Veo 3 is the third generation of the powerful Gemini video generation model. With the right prompt, it can produce videos that include sound effects, background noises, and, yes, dialogue.

Google briefly demonstrated this capability for the video model. The clip was a CGI-grade animation of some animals talking in a forest. The sound and video were in perfect sync.

If the demo can be converted into real-world use, this represents a remarkable tipping point in the AI content generation space.

"We’re emerging from the silent era of video generation," said Google DeepMind CEO Demis Hassabis in a press call.

Lights, camera, audio

He isn't wrong. Thus far, no other AI video generation model can simultaneously deliver synchronized audio, or audio of any kind, to accompany video output.

It's still not clear if Veo 3, which, like its predecessor, Veo 2, should be able to output 4K video, surpasses current video generation leader OpenAI Sora in the video quality department. Google has, in the past, claimed that Veo 2 is adept at producing realistic and consistent movement.

Regardless, outputting what appears to be fully produced video clips (video and audio) may instantly make Veo a more attractive platform.

It's not just that Veo 3 can handle dialogue. In the world of film and TV, background noises and sound effects are often the work of Foley artists. Now, imagine if all you need to do is describe to Veo the sounds you want behind and attached to the action, and it outputs it all, including the video and dialogue. This is work that takes animators weeks or months to do.

In a release on the new model, Google suggests you tell the AI "a short story in your prompt, and the model gives you back a clip that brings it to life."

If Veo 3 can follow prompts and output minutes or, ultimately, hours of consistent video and audio, it won't be long before we're viewing the first animated feature generated entirely through Veo.

Veo is live today and available in the US as part of the new Ultra tier ($249.99 a month) in the Gemini App and also as part of the new Flow tool.

Google also announced a few updates to its Veo 2 video generation model, including the ability to generate video based on reference objects you provide, camera controls, outpainting to convert from portrait to landscape, and object add and erase.

@techradar
♬ original sound - TechRadar

A 38-year industry veteran and award-winning journalist, Lance has covered technology since PCs were the size of suitcases and “on line” meant “waiting.” He’s a former Lifewire Editor-in-Chief, Mashable Editor-in-Chief, and, before that, Editor in Chief of PCMag.com and Senior Vice President of Content for Ziff Davis, Inc. He also wrote a popular, weekly tech column for Medium called The Upgrade.

Lance Ulanoff makes frequent appearances on national, international, and local news programs including Live with Kelly and Mark, the Today Show, Good Morning America, CNBC, CNN, and the BBC.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

Lights, camera, audio

You might also like