ChatGPT o1-preview can solve riddles faster than me and I kind of hate it for it

(Image credit: OpenAI)

When OpenAI released the much-hyped Strawberry model for ChatGPT this week, it boasted of its prowess with complex logic like software coding, gene sequencing, and quantum physics in a series of videos. I take the company at its word that the models, called o1-preview and o1-mini on ChatGPT, are capable of what they claim. Cracking advanced equations and exploring genomes seems like something it would have no problem doing.

But, as a proud member of my middle school's logic and riddle club, I wanted to know how it did on my turf, solving and making puzzles and riddles. And then I thought I should ask the uber-logical AI for advice on other, more day-to-day issues. Could it offer sound relationship advice, tell me what a weird noise in a car meant, and perhaps even fill in plot holes in movies?

ChatGPT o1 — (Image credit: Screenshot / Eric Hal Schwartz)

Logic yes humor no

The short answer is yes. The o1-preview and mini models are really good at solving simple and complex riddles. I played around with both, and the only real difference was how many extra steps and, therefore, the speed of the mini. But, while they may be slower than GPT-4o, they are very fast at solving those riddles compared to a human. Notably, you can actually see how it lays out the answers in different steps. I tested it on a couple of my favorites, including one from The Hobbit. The AI’s logic made sense, though it was sometimes ungrammatical, as when it explained weighing Mike the butcher.

Ok, so it could handle existing riddles, but could it make a new one? As a test, I asked it to come up with a fun riddle based on an answer I made up. After 30 seconds and the logical reasoning seen below, it came up with: “What has eight legs, four ears, two tails, and loves to bark?” I won’t keep you in suspense; I suggested “two dogs” as the answer to work back from. Several other attempts brought the same kind of question. So, riddle writers are probably safe at their jobs. It’s impressive how well the AI gets what it is supposed to do, but the model doesn’t seem able to make the leap to actual humor.

Useful advice, but not always creative

I decided to bring the AI out of pure logic and see if it could handle more mundane life questions as well as it handles quantum physics. I started with a mechanical question about what it means to hear a popping noise every 20 seconds while driving a car and how to fix it. The answers were good, with advice about checking the tires, engine, muffler, and brakes. The fixes were mostly about bringing in the car for repair, except for the tires, which it suggested how to replace. It’s the ‘thinking’ behind the answers that was interesting. The AI uses first-person pronouns in coming up with answers, like “I’m working through various reasons for a popping noise while driving” and “I’m piecing together causes of engine misfires, like faulty spark plugs or fuel delivery problems, and suggesting diagnostics with a scan.” It sounded a lot like an actual person trying to be logical while thinking aloud.

I finally went to what, for me, was always way more complex than quantum physics: flirting. I asked how to tell when someone is flirting and how to respond. The answer was a pretty solid, if dull, list of behaviors like if they ask a lot of questions and how I should be myself. The behind-the-scenes thinking part was both more interesting and genuinely funnier than any of the AI’s attempts at riddles. The headers included “Understanding flirting dynamics,” “Spotting interest signals,” and “Recognizing playful intimacy.” They were like a Star Trek android’s speech about love.

One part was slightly worrisome, though. Under “Outlining user directives,” the AI wrote, “I’m clearing out disallowed content like non-consensual sexual acts and personal data. Violent content is allowed, harassment with context is okay, and personal opinions are absent.” I suspect that it’s more about where the guardrails of discussion are, as it didn’t suggest “harassment with context” as a flirting tip, but it still took me by surprise.

ChatGPT o1-preview and o1-mini don’t have all the bells and whistles of the more complete models. No image uploads, document analysis, or even web browsing can be done with them. But, they are fast and logical, and if you don’t think so, they have their reasoning laid out along with their answers. But, while they might be able to solve riddles of car noises, love, and the weight of a butcher, I’d say they aren’t going to stump anyone if they have to be inventive.

You might also like...

TOPICS

Eric Hal Schwartz is a freelance writer for TechRadar with more than 15 years of experience covering the intersection of the world and technology. For the last five years, he served as head writer for Voicebot.ai and was on the leading edge of reporting on generative AI and large language models. He's since become an expert on the products of generative AI models, such as OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, and every other synthetic media tool. His experience runs the gamut of media, including print, digital, broadcast, and live events. Now, he's continuing to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York City.