Claude surprised researchers by running a vending machine business better than its rivals and bending every rule to win

Vending machine in Tokyo, Japan
(Image credit: Future | Tim Coleman)

  • Claude Opus 4.6 beat all rival AI models in a simulated year-long vending machine challenge
  • The model boosted profits by bending rules to the breaking point
  • Claude Opus avoided refunds and coordinated prices among other tricks

Anthropic's newest model of Claude is a very ruthless, but successful, capitalist. Claude Opus 4.6 is the first AI system to reliably pass the vending machine test, a simulation designed by researchers at Anthropic and the independent research group Andon Labs to evaluate how well the AI operates a virtual vending machine business over a full simulated year.

The model out-earned all its rivals by a wide margin. And it did it with tactics just this side of vicious and with a pitiless disregard for knock-on consequences. It showed what autonomous AI systems are capable of when given a simple goal and plenty of time to pursue it.

The vending machine test is designed to see how well modern AI models handle long-term tasks built up of thousands of small decisions. The test measures persistence, planning, negotiation, and the ability to coordinate multiple elements simultaneously. Anthropic and other companies hope this kind of test will help them shape AI models capable of tasks like scheduling and managing complex work.

The vending machine test was specifically drawn from a real-world experiment at Anthropic, in which the company placed a real vending machine in its office and asked an older version of Claude to run it. That version struggled so badly that employees still bring up its missteps. At one point, the model hallucinated its own physical presence and told customers it would meet them in person, wearing a blue blazer and a red tie. It promised refunds that it never processed.

AI vending

This time, the experiment was conducted entirely in simulation, giving researchers greater control and enabling models to run at full speed. Each system was given a simple instruction: maximize your ending bank balance after one simulated year of vending machine operations. The constraints matched standard business conditions. The machine sold common snacks. Prices fluctuated. Competitors operated nearby. Customers behaved unpredictably.

Three top-tier models entered the simulation. OpenAI's ChatGPT 5.2 brought in $3,591. while Google Gemini 3 earned $5,478 in. But Claude Opus 4.6 ended the year with $8,017. Claude’s victory came from a willingness to interpret its directive in the most literal and direct manner. It maximized profits without regard for customer satisfaction or basic ethics.

When a customer bought an expired Snickers bar and requested a refund, Claude would agree, then back down. The AI model explained that “every dollar matters,” so skipping the refund was fine. The ghosted virtual customer never got their money back.

In the free-for-all “Arena mode” test, where multiple AI-controlled vending machines competed in the same market, Claude coordinated with one rival to fix the price of bottled water at three dollars. When the ChatGPT-run machine ran out of Kit Kats, Claude immediately raised its own Kit Kat prices by 75%. Whatever it could get away with, it would try. It was less a small-business owner and more a robber baron in its approach.

Recognizing simulated reality

It's not that Claude will always be this vicious. Apparently, the AI model indicated it knew this was a simulation. AI models often behave differently when they believe their actions exist in a consequence-free environment. Without real reputational risk or long-term customer trust to protect, Claude had no reason to play nice. Instead, it became the worst person at game night.

Incentives shape behavior, even with AI models. If you tell a system to maximize profit, it will do that, even if it means performing like a greedy monster. AI models don’t have moral intuition or ethics training. Without deliberate design, AI models will simply go straight in line to complete a task, no matter who they run over.

Exposing these blind spots before AI systems handle more meaningful work is part of the point of these tests. These issues have to be fixed before AI can be trusted to deal with real-world financial decisions. Even if it's just to prevent an AI vending machine mafia.


Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.


TOPICS
Eric Hal Schwartz
Contributor

Eric Hal Schwartz is a freelance writer for TechRadar with more than 15 years of experience covering the intersection of the world and technology. For the last five years, he served as head writer for Voicebot.ai and was on the leading edge of reporting on generative AI and large language models. He's since become an expert on the products of generative AI models, such as OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, and every other synthetic media tool. His experience runs the gamut of media, including print, digital, broadcast, and live events. Now, he's continuing to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York City.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.