I pitched the new Google Gemini against ChatGPT for AI image generation – and the results shocked me

AI image generated by Imagen 4 in Google Gemini. (Image credit: Google)

Not a day goes by without another ChatGPT image trend appearing in your social feeds - a couple of days ago, it was turning your photos into Renaissance art, for example, but why is it that you don’t get the same things happening with Gemini, since it can also generate AI images?

It's because the AI image generation trend started happening after ChatGPT got a serious image upgrade in March, while at the time, Gemini was still relying on Imagen 3, which had some limitations.

Well, yesterday at Google I/O all Gemini users (both free and paid) got a free upgrade to Imagen 4, which offers much better image quality and typography as well as the ability to produce images up to 2K resolution and use image sizes outside of the 1:1 ratio.

Best of all, Imagen 4 is live right now, and you can use it by simply going to gemini.google.com or using the mobile app.

The big question is, can you now use the new Imagen 4 in Gemini to replace ChatGPT for image generation? Let's find out.

Gemini and ChatGPT compared

First, let's look at limits. Google is quite upfront about how many images a day you can generate. In Gemini, free users can generate 10-20 images a day, while Gemini Advanced subscribers can generate 100-150, depending on the server demand.

With ChatGPT, usage limits are more opaque and vary much more depending on how many people are using it. For example, ChatGPT currently tells me that image generation is not even available for free users, while ChatGPT Plus subscribers can generate “a few dozen images per day”. Even so, it did let me generate an image in the free version, and typically, I’ve found I can get about three or four images a day before I reach my limit on the free tier.

To test Gemini against ChatGPT, I decided to use a ChatGPT Plus account and a Gemini Advanced account so that I wouldn’t have to worry about hitting usage limits. I also used prompts provided by OpenAI and Google to compare image generation. Since these prompts were provided by the companies, they probably highlighted the particular abilities of each image generator, so I split the tests equally between Google and OpenAI-generated prompts.

1st test - a cinematic image

First up was a prompt provided by Google:

Prompt: Filmed cinematically from the driver's seat, offering a clear profile view of the young passenger on the front seat with striking red hair. Her gaze is fixed ahead, concentrated on navigating the dusty, lonely highway visible through her side window, which shows a blurred expanse of dry earth and perhaps distant, hazy mountains. Her arm rests on the window ledge or steering wheel. The shot includes part of the aged truck interior beside her – the door panel, maybe a glimpse of the worn seat fabric. The lighting could be late afternoon sun, casting long shadows and warm highlights across her face and the truck's interior. This angle emphasizes her individual presence and contemplative state within the vast, empty landscape.

Unsurprisingly, Gemini produced a fantastic image from this prompt that really showcased the power of Imagen 4:

Gemini-generated AI image. (Image credit: Google)

In contrast ChatGPT provided this:

AI-generated image made with ChatGPT. — ChatGPT-generated image. (Image credit: OpenAI)

It's not bad, and the model’s arm is resting on the wheel as requested, but the wheel isn’t visible, which makes the truck, which looks more like a car, look less realistic. ChatGPT’s image is also a lot darker, which is a great way to hide details, but results in a less striking image than the one Gemini produced.

Verdict: Gemini is the winner here. It generated an image that is much closer to what we asked for and that looks incredibly realistic. I'm impressed!

2nd test - an image of friends

The second prompt is provided by OpenAI:

Prompt: Generate a candid, Polaroid-style photograph of four diverse friends in their early 20s at a gritty dive bar. The lighting features a very harsh, direct flash, creating sharp shadows and giving the photo a very overexposed, vintage instant-camera feel. Colors should be slightly muted, evoking nostalgic, early-2000s party vibes. The aesthetic is casually emo. No border or logos or signs. There's an interesting looking wall behind them with some light graffiti. The quality of the image should be very sharp and detailed (very little grain). The energy should be silly and chaotic. They're either playfully grimacing, smiling, or pretending to look tough. One of them should have their friend in a silly, playful headlock. Their mouths are closed.

Gemini, which clearly has a problem counting to four, produced this image:

Gemini-generated image of friends. — Gemini-generated image for "four" friends. (Image credit: Google)

ChatGPT generated this image:

ChatGPT-generated image — An image of four friends generated by ChatGPT. (Image credit: OpenAI)

Verdict: Now, I don't think either of them has done an especially good job of representing a "diverse" group, but at least ChatGPT has the correct number of people. The winner is ChatGPT.

3rd test - an object with text on

Google is really stressing how much Imagen 4 has improved at typography, so I’ve chosen a test that asks to present text in images.

The prompt I've used is provided by Google:

Prompt: Capture an intimate close-up bathed in warm, soft, late-afternoon sunlight filtering into a quintessential 1960s kitchen. The focal point is a charmingly designed vintage package of all-purpose flour, resting invitingly on a speckled Formica countertop. The packaging itself evokes pure nostalgia: perhaps thick, slightly textured paper in a warm cream tone, adorned with simple, bold typography (a friendly serif or script) in classic red and blue “ALL-PURPOSE FLOUR”, featuring a delightful illustration like a stylized sheaf of wheat or a cheerful baker character. In smaller bold print at the bottom of the package: “NET WT 5 LBS (80 OZ) 2.27kg”. Focus sharply on the package details – the slightly soft edges of the paper bag, the texture of the vintage printing, the inviting "All-Purpose Flour" text. Subtle hints of the 1960s kitchen frame the shot – the chrome edge of the counter gleaming softly, a blurred glimpse of a pastel yellow ceramic tile backsplash, or the corner of a vintage metal canister set just out of focus. The shallow depth of field keeps attention locked on the beautifully designed package, creating an aesthetic rich in warmth, authenticity, and nostalgic appeal.

Gemini produced this image:

An image generated in Gemini. — A Gemini-generated bag of flour. (Image credit: Google)

ChatGPT provided this image:

ChatGPT-generated image of a bag of flour. — (Image credit: OpenAI)

Verdict: I think both models have done a great job of producing readable text, but ChatGPT introduced some inaccuracies - “LS” instead of “LBS” and “2,27” instead of “2.27”, so the winner is Gemini.

4th test - a lot of words in an image

The last test only added a few words to an image, but what happens when you need to have much more text involved in an image? I tried this prompt provided by OpenAI:

Prompt: Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign. Context: a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)\n Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic. Characters: one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs. Composition from background to foreground: streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

Gemini produced this image:

Gemini-generated image of witches at a signpost. — (Image credit: Google)

ChatGPT produced this image:

ChatGPT-generated image of witches at a signpost. — (Image credit: OpenAI)

Verdict: The Gemini image is brighter, but the clear winner here is ChatGPT. It’s much better at producing road signs, at least. The Gemini example is littered with text errors, but the ChatGPT one is much cleaner and still imperfect - "vehicles" and "forbidden" are both misspelled.

Who wins?

In general, I think Imagen 4 in Gemini is impressive. I love the level of detail in its images, and they feel brighter than those generated by ChatGPT, which seem to have a muddy feel by comparison. Perhaps the most impressive thing is the speed at which Gemini produces its images. It generated all our test images in seconds, rather than the minutes ChatGPT required.

There are still areas where ChatGPT is superior – images with lots of text, for example. However, for sheer speed and convenience, it’s going to be hard not to default to Gemini now when I need a quick AI image created.

But here’s the rub: if you upload an image of yourself to Gemini and ask it to transform it into a Studio Ghibli style image, which is what started the whole ChatGPT image trend initially, it just can’t do it.

It’s not even worth doing it as a test because Gemini fails spectacularly. I tried uploading a picture of me with the prompt “Transform this image in the style of Studio Ghibli. Just like their biggest fan and admirer would, training for years to master the technique to near perfection” and it just generated a random Studio Ghibli-style image of a group of people, none of whom resembled me at all. Moral issues aside, ChatGPT had no problem producing a Ghibli-style image of me.

Transforming images is a key area where ChatGPT has a clear advantage over Gemini in AI image generation; however, if you want to simply produce an AI image from scratch, then Gemini can’t be ignored.