ChatGPT just announced it can finally pass the simple ‘how many “r”s in strawberry’ test, but users are still tripping it up by switching to ‘cranberry’

AI Farmer Assistant Picking Fresh Fruit At Plant
(Image credit: Getty Images / AndreyPopov)

  • ChatGPT passes “strawberry” test but fails when switched to “cranberry”
  • AI still struggles with simple letter-counting despite broader improvements
  • Reasoning tests like “car wash” still expose gaps in AI logic

There are a number of viral posts from people astonished that chatbots like ChatGPT and Claude can solve complex equations but struggle with something as simple as counting the number of “r”s in the word “strawberry”. Well, those days could finally be over.

With the words "At long last", the official ChatGPTapp X account proudly announced today that it can now count the number of “r”s in “strawberry” — a laughably easy task for humans that has traditionally been difficult for AIs to get right.

However, users very quickly found that you could still trip it up by swapping out “strawberry” for “cranberry”.

Article continues below

“Not so fast,” said X user @NathanEspinoza_ in response to ChatGPTapp’s boastful post about solving the strawberry problem, as he posted an image showing ChatGPT had responded saying that there was only one "r" in "cranberry".

To corroborate the result, I quickly tried the same thing with my version of ChatGPT on GPT-5.5, and I was told there were two "r"s — a different result, but still wrong. It passed the “strawberry” test perfectly, saying there were three “r”s, but then claimed there were only two in “cranberry”. To its credit, ChatGPT did admit its mistake when I questioned it, putting it down to a simple “counting error”.

Why the strawberry problem exists

There are a few very simple questions that chatbots are notoriously bad at answering, one of which is “how many ‘r’s are in strawberry?”

This is a straightforward counting task for humans, but it’s surprisingly difficult for AI systems. The reason comes down to how they process language. Large language models (LLMs) are built on transformers, which convert words like “strawberry” into numerical representations. Those representations capture meaning and context, but they don’t inherently preserve a clear sense of the individual letters that make up the word.

The fact that ChatGPT is still stumbling over “cranberry” suggests the solution may have been hard-coded for specific cases, rather than reflecting a broader improvement in how the LLM handles these kinds of questions.

The car wash problem

The second boast in ChatGPTapp’s post is that ChatGPT can now solve the car wash problem. This exploits a context gap in how LLMs reason, by asking whether it would be quicker to walk to a car wash or drive if it’s “only 50 meters away”. Most models will tell you it’s quicker to walk, missing the obvious issue that you need your car with you to wash it.

ChatGPTapp claims that ChatGPT will now catch this error and point it out. But when I tried it using the latest GPT-5.5 model, it still recommended walking — as did Claude using Sonnet 4.6. When I tested it in Gemini, however, it pointed out that while walking would be quicker, you’d need to bring the car with you if the goal was to wash it.

Grok did even better. Not only did it flag the issue of not bringing the car, but it added that “this question has become a popular test for whether someone (or an AI) grasps the actual goal versus giving generic ‘walking is healthier/shorter/greener’ advice that ignores the context.”

So, for now at least, that’s a win for Gemini and Grok. But if fixing “strawberry” doesn’t fix “cranberry”, it raises a bigger question — are these models actually getting smarter, or just getting better at passing the tests we keep throwing at them?


Google logo on a black background next to text reading 'Click to follow TechRadar'

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.


Purple circle with the words Best business laptops in white
The best business laptops for all budgets
TOPICS
Graham Barlow
Senior Editor, AI

Graham is the Senior Editor for AI at TechRadar. With over 25 years of experience in both online and print journalism, Graham has worked for various market-leading tech brands including Computeractive, PC Pro, iMore, MacFormat, Mac|Life, Maximum PC, and more. He specializes in reporting on everything to do with AI and has appeared on BBC TV shows like BBC One Breakfast and on Radio 4 commenting on the latest trends in tech. Graham has an honors degree in Computer Science and spends his spare time podcasting and blogging.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.