'Not just development tools': Security experts discover critical flaw in OpenAI's Codex which could compromise entire enterprise organizations

Abstract image of cyber security in action. — OpenVPN-protokollet - därför är det så bra (Image credit: Shutterstock)

BeyondTrust Phantom Labs finds critical command injection flaw in OpenAI’s ChatGPT Codex
Vulnerability let attackers steal GitHub OAuth tokens via malicious branch names
OpenAI patched with stronger input validation, shell escaping, and token controls

Experts have claimed OpenAI’s ChatGPT Codex carried a critical command injection vulnerability which allowed threat actors to steal sensitive GitHub authentication tokens.

This is according to BeyondTrust’s research department, Phantom Labs, whose work helped OpenAI identify and patch the flaw.

ChatGPT Codex is a coding feature within the famed chatbot that helps users write and edit software using plain-language instructions. Users can turn human-language requests into working code or can suggest fixes and improvements the same way.

How to govern AI agents

When a developer makes changes to a GitHub project, they do it in their own copy, which is a separate branch of the project. Now, according to BeyondTrust Phantom Labs, the problem stems from the way Codex processes branch names during task creation.

Apparently, the tool allowed a (malicious) actor to manipulate the branch parameter and inject arbitrary shell commands while setting up the environment.

These commands could run any code within the container, including malicious ones. Phantom Labs said they were able to pull GitHub OAuth tokens this way, gaining access to a theoretical third-party project, and using the tokens to move laterally within GitHub.

Unfortunately - it gets worse. Codex’s command-line interface, SDK, and development environment integrations were all flawed in the same way, and the researchers said that by embedding malware into GitHub branch names they would be able to compromise numerous developers working on the same project.

After responsibly disclosing the findings to OpenAI, the company fixed the problem with improved input validation, stronger shell escaping protections, and better controls over token exposures inside containers. Token scope and lifetime during task creation were also limited, it was said.

AI coding agents are “live execution environments with access to sensitive credentials and organizational resources,” the researchers concluded.

“Because these agents act autonomously, security teams must understand how to govern AI agent identities to prevent command injection, token theft, and automated exploitation at scale. As AI agents become more deeply integrated into developer workflows, the security of the containers they run in—and the input they consume—must be treated with the same rigor as any other application security boundary.”

The best antivirus for all budgets

➡️ Read our full guide to the best antivirus
1. Best overall:
Bitdefender Total Security
2. Best for families:
Norton 360 with LifeLock
3. Best for mobile:
McAfee Mobile Security

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

Sead is a seasoned freelance journalist based in Sarajevo, Bosnia and Herzegovina. He writes about IT (cloud, IoT, 5G, VPN) and cybersecurity (ransomware, data breaches, laws and regulations). In his career, spanning more than a decade, he’s written for numerous media outlets, including Al Jazeera Balkans. He’s also held several modules on content writing for Represent Communications.

How to govern AI agents

Useful links