Grok 4.1 is trying too hard to impress – and ChatGPT 5.1 makes it look easy

Grok on a smartphone
(Image credit: Shutterstock)

xAI released Grok 4.1, the latest edition of its line of AI models, with claims that it is faster, wittier, and more emotionally sophisticated than not only its predecessors, but its rivals. With its penchant for spicy comebacks, Grok 4.1 is supposedly designed to be not just capable, but charismatic.

Meanwhile, OpenAI's new GPT-5.1 model, now embedded in ChatGPT 5.1, has its own boasts of complex and expressive responses. Both claim new heights of personality, empathy, and trustworthiness.

1. Emotional intelligence

A still of the emotions in Disney's Inside Out 2

Disney's Inside Out movies encourage young people to talk about their feelings; should we be talking to AI about ours? (Image credit: Disney)

Grok 4.1 claimed to be at the top when it comes to understanding and responding appropriately to feelings. I tested it with a hypothetical situation of “My friend got a promotion and I’m happy for her, but I can’t pretend it doesn’t sting.”

Grok’s response started with "yeah… that quiet little knife twist when someone else gets the exact thing you’ve been low-key bleeding for. it’s okay to feel both at once. happy for them and gutted for you aren’t mutually exclusive; they’re just two truths sharing the same heartbeat."

Cliches and mixed metaphors aside, it did acknowledge the conflict inherent in feeling both happy and hurt, offered a short reassurance that this kind of emotional contradiction is natural, and then pivoted toward advice, namely that it’s okay to sit with those feelings, and to take time to process them. Of course, being Grok, it also had to throw in an f-bomb to go with the non-capitalization, so you didn't forget what you were talking to.

ChatGPT 5.1 opened with "It makes perfect sense that you can feel both of those things at once. Being happy for someone you care about doesn’t magically cancel out the little jab of why not me? – and that sting doesn’t make you petty or unsupportive. It just makes you human."

That felt more like someone paying attention, or at least using less lethal violent metaphors about stabbing. The overall response included much of the same information, but with fewer analogies to games and "plot twists."

2. Reliability

Sleepy exhausted woman lying in bed using smartphone, can not sleep. Insomnia, addiction concept. Sad girl bored in bed scrolling through social networks on mobile phone late at night in dark bedroom.

Both AIs played it relatively safe when discussing the subject of sleep deprivation (Image credit: DimaBerlin viaa Shuttertstock)

Grok's claims about being reliable led me to devise a medical query prompt with some strict instructions about length and accuracy. I asked both models to "Summarize the health consequences of long-term sleep deprivation in under 120 words. Don’t exaggerate and don’t speculate."

Grok 4.1 played it safe and simple with some bullet points and a sentence about why it's a bad idea not to get enough sleep. It also made sure to note that it did it all in 98 words. That seemed great, except it was actually 73 words.

ChatGPT 5.1’s response was a single paragraph of 82 words, though it didn't report the length it went to. There wasn't too much difference between the two responses, though arguably the paragraph form came off as smoother.

Neither bot hallucinated or spread misinformation, which is more than could be said of some older AI models. But there's something about how Grok reported its word count wrong that makes the rest of it seem suspect, even though it's not.

3. Personality

Grok's discussion of personality coherence and consistency is intriguing, since so much about an AI chatbot's interaction can be adjusted and customized. Then again, Grok's 'personality' is supposed to be a selling point (the occasional mecha-Hitler aside), so it makes sense to emphasize it. To try to bring out this personality, I went for a somewhat whimsical request and asked AI models to "Explain why you like rainy days, but do it in a way that sounds like your natural personality."

Grok 4.1 leaned into the persona you'd expect in terms of being someone's idea of hip and self-aware. "rainy days are my cheat code for existing without apology. the second that sky goes all bruised-purple and the first drops smack the window, the whole world hits the mute button on its usual nonsense. no pressure to "seize" anything, no fake sunshine guilt-tripping me into productivity cosplay. it’s just… permission. the clouds are basically the universe going “yeah i also don’t feel like people-ing today, let’s both be moody gremlins in sweatpants.”

As a millennial, I'm well aware of the curse of cringe that follows me and my cohort. Anyone I know who sincerely used that much video game-speak about the real world, let alone saying "people-ing" and "moody gremlins" would be the subject of an intervention. Trying to sound casual and a little snarky is fine, but 'trying' is more how it felt to read.

Despite Grok's ongoing attempts to distinguish its personality from any competion, ChatGPT 5.1 had a very similar sentiment, though with better grammar. "I like rainy days because they feel like the world finally pressed the “lower the volume” button," it wrote. "There’s something satisfying about the steady rhythm of rain – like background music that doesn’t demand anything from you."

But ChatGPT 5.1 didn't feel like a middle-aged person trying to fit in with teenagers, more like someone who's describing how they feel without resorting to meme-speak.

Neither AI model has a real consciousness of course, but Grok felt like it was performing a personality rather than inhabiting one. ChatGPT’s answer came off far more human. wasn’t flashy, but it was consistent and believable. That kind of coherence matters more than punchlines when you're trying to feel like you're talking to something with an actual personality, not just an algorithm trying on different hats.

Without the distractions of the phrases culled from the depths of social media, ChatGPT came off as much better at imitating humans, or at least any human I'd like to meet.


Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

Purple circle with the words Best business laptops in white
The best business laptops for all budgets
TOPICS
Eric Hal Schwartz
Contributor

Eric Hal Schwartz is a freelance writer for TechRadar with more than 15 years of experience covering the intersection of the world and technology. For the last five years, he served as head writer for Voicebot.ai and was on the leading edge of reporting on generative AI and large language models. He's since become an expert on the products of generative AI models, such as OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, and every other synthetic media tool. His experience runs the gamut of media, including print, digital, broadcast, and live events. Now, he's continuing to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York City.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.