I used Veo 3 to recreate the first YouTube video, and the results are almost too good

Combo image of first YouTube video and an AI recreation image grab

(Image credit: Future)

We all know the story of the first YouTube video, a grainy 19-second clip of co-founder Jawed Karim at the zoo, remarking on the elephants behind him. That video was a pivotal moment in the digital space, and in some ways, it is a reflection, or at least an inverted mirror image, of today as we digest the arrival of Veo 3.

Part of Google Gemini, Veo 3 was unveiled at Google I/O 2025 and is the first generative video platform that can, with a single prompt, generate a video with synced dialogue, sound effects, and background noises. Most of these 8-second clips arrive in under 5 minutes after you enter the prompt.

I've been playing with Veo 3 for a couple of days, and for my latest challenge, I tried to go back to the beginning of social video and that YouTube "Me at the Zoo" moment. Specifically, I wondered if Veo 3 could recreate that video.

As I've written, the key to a good Veo 3 outcome is the prompt. Without detail and structure, Veo 3 tends to make the choices for you, and you usually don't end up with what you want. For this experiment, I wondered how I could possibly describe all the details I wanted to derive from that short video and deliver them to Veo 3 in the form of a prompt. So, naturally, I turned to another AI.

Google Gemini 2.5 Pro is not currently capable of analyzing a URL, but Google AI Mode, the brand-new form of search that is quickly spreading across the US, is.

Here's the prompt I dropped into Google's AI Mode:

AI Mode URL analysis — (Image credit: Future)

Google AI Mode almost instantly returned with a detailed description, which I took and dropped into the Gemini Veo 3 prompt field.

I did do some editing, mostly removing phrases like "The video appears..." and the final analysis at the end, but otherwise, I left most of it and added this at the top of the prompt:

"Let's make a video based on these details. The output should be 4:3 ratio and look like it was shot on 8MM videotape."

It took a while for Veo 3 to generate the video (I think the service is getting hammered right now), and, because it only creates 8-second chunks at a time, it was incomplete, cutting off the dialogue mid-sentence.

Still, the result is impressive. I wouldn't say that the main character looks anything like Karim. To be fair, the prompt doesn't describe, for instance, Karim's haircut, the shape of his face, or his deep-set eyes. Google's AI Mode's description of his outfit was also probably insufficient. I'm sure it would have done a better job if I had fed it a screenshot of the original video.

Note to self: You can never offer enough detail in a generative prompt.

8 seconds at a time

The Veo 3 video zoo is nicer than the one Karim visited, and the elephants are much further away, though they are in motion back there.

Veo 3 got the film quality right, giving it a nice 2005 look, but not the 4:3 aspect ratio. It also added archaic and unnecessary labels at the top that thankfully disappear quickly. I realize now I should have removed the "Title" bit from my prompt.

The audio is particularly good. Dialogue syncs well with my main character and, if you listen closely, you'll hear the background noises, as well.

The biggest issue is that this was only half of the brief YouTube video. I wanted a full recreation, so I decided to go back in with a much shorter prompt:

Continue with the same video and add him looking back at the elephants and then looking at the camera as he's saying this dialogue:

"fronts and that's that's cool." "And that's pretty much all there is to say."

Veo 3 complied with the setting and main character but lost some of the plot, dropping the old-school grainy video of the first generated clip. This means that when I present them together (as I do above), we lose considerable continuity. It's like a film crew time jump, where they suddenly got a much better camera.

I'm also a bit frustrated that all my Veo 3 videos have nonsensical captions. I need to remember to ask Veo 3 to remove, hide, or put them outside the video frame.

I think about how hard it probably was for Karim to film, edit, and upload that first short video and how I just made essentially the same clip without the need for people, lighting, microphones, cameras, or elephants. I didn't have to transfer footage from tape or even from an iPhone. I just conjured it out of an algorithm. We have truly stepped through the looking glass, my friends.

I did learn one other thing through this project. As a Google AI Pro member, I have two Veo 3 video generations per day. That means I can do this again tomorrow. Let me know in the comments what you'd like me to create.

See more Computing News

A 38-year industry veteran and award-winning journalist, Lance has covered technology since PCs were the size of suitcases and “on line” meant “waiting.” He’s a former Lifewire Editor-in-Chief, Mashable Editor-in-Chief, and, before that, Editor in Chief of PCMag.com and Senior Vice President of Content for Ziff Davis, Inc. He also wrote a popular, weekly tech column for Medium called The Upgrade.

Lance Ulanoff makes frequent appearances on national, international, and local news programs including Live with Kelly and Mark, the Today Show, Good Morning America, CNBC, CNN, and the BBC.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

8 seconds at a time

You might also like