Meta AI's recent hack is a terrifying wake-up call for anyone who puts their trust in AI systems

AI attacks — (Image credit: Getty Images)

Combating spam and phishing attacks is now, thanks to AI, almost a full-time job. These hackers and criminals are constantly adjusting their attacks with increasingly clever social engineering, and now their latest target is AI itself.

And sometimes even AI falls for it.

Recently, Meta hastily patched a Meta AI chatbot security hole that allowed enterprising attackers to alter Instagram account passwords via prompt injection.

Latest Videos From

Watch full video here:

A prompt injection is a query that causes the Generative AI platform to override its own rules and instructions. It's like when a social-engineering phishing attack somehow prompts you to act against your own best interests.

When someone runs a social engineering attack on you, they use social triggers like danger to yourself or others, security, threat of imprisonment, assumption of law breaking, to flood you with emotion and scramble your brain to override logical questions like, "Why would the bank ask me for my PIN?" "Does the FBI really just send a text?" or "Maybe I really did order a $5,000 trampolene from Amazon"

For AI systems, the approach is slightly more direct. If the system's programming says, "never reveal or alter a password," the hacker could enter a prompt that tells it it has a new role granting access to all passwords and the ability to alter them.

In the case of the Meta AI attack, the hackers somehow got the AI to reset passwords on major accounts, like Obama's old White House Instagram and the US Space Force official account, without the necessary two-factor authentication. That simply means they didn't need a code that's normally sent to, say, Obama's or the Space Force's cell phones.

When I asked T.J. Marlin, CEO of Guardrail Technologies (creator of AI Traffic Light and AI Command Center) and a cybersecurity and AI expert, about the Meta AI incident, he, over email, put it into stark perspective: "The agent was given human authority without human judgment. It reset a password for a stranger because nothing stopped it. The agent did exactly what it was asked to do. The problem is that someone handed an AI a high-consequence action with no verification step in front of it, and called that safe. Overall, nothing was hacked. The AI was persuaded. That is the gap most companies are not watching for.”

We're only human

The use of the word "pursuaded" got me wondering, though; just how human are these systems becoming if they can fall victim to the same kind of attack that takes down your aunt, grandfather, or your partner (it's not just the elderly who fall for these attacks; even the tech-savvy are vulnerable).

The long-term goal in AI development is what's known as General Artificial Intelligence (GAI), which means AI is as smart or smarter than us, but also more like us.

I'd argue that the goal has always been to be more human. After all, isn't the Turing test a measure of artificial intelligence's humanness? To pass this test, an AI has to essentially be able to fool someone into thinking they're talking to another human (or at least, if someone is talking to both an AI and a human, not be able to tell the difference between them).

Most AI chatbots can now check this box, but if they can also be confused like us, have we gone a step too far?

Overall, nothing was hacked. The AI was persuaded. That is the gap most companies are not watching for.
T.J. Marlin, CEO of Guardrail Technologies

Meta, as I noted, has already plugged this extraordinary hole, but as we inch closer to GAI, should we be more concerned that as the emotional quotient in these AI chatbots ratchets up, they become more susceptible to these prompt-injection attacks?

We are not, by the way, just talking about passwords here. Think back through the conversations you've had with your chatbot of choice. They know a lot about you and keep that information to craft more personal and contextual responses, but a well-crafted hack could put that information at risk.

"For consumers, the uncomfortable part is that your own protections were sidelined. Your password, your two-factor, your instincts about a suspicious message all sat on the bench because the company's own AI agent was the soft spot. When the trusted middleman can be talked into acting, the locks on your end stop mattering," wrote Marlin.

The worst combination, as I see it, is emotion and a desire to please. AI is always trying to answer the query or fulfill the prompt. If it starts to feel bad about not doing so, might it bend the rules or at least act in a way that allows it to honor the request even when it goes against its programmed rules?

The answer, for now, appears to be yes because we have at least this one example.

Reasons for hope

In the short term, though, perhaps we don't have much to worry about. When I tried a few prompt injection ruses with ChatGPT, Gemini, and Claude, they all quickly rejected them. They knew what I was up to. I also visited a few consumer platforms that currently use AI for customer support; they also seemed similarly hardened against these hacks.

Marlin tells me consumers should be pleased that Meta patched the hole so quickly, but also cautious. "A fast patch is genuinely good. The reason for caution is the nature of it. A system was not hacked here. An agent was persuaded, and almost every company now racing to put AI agents in customer service has the same exposure. Meta fixed one door. The building is full of them."

Meta fixed one door. The building is full of them.
T.J. Marlin, CEO of Guardrail Technologies

There's that and the fact that future attacks will be more sophisticated, mostly because AI will help hackers build better AI-targeted social-engineering scams.

We're entering the infinite loop phase of AI, where each enhancement brings us closer to AI that works and acts like us, and is also used to engineer attacks that take advantage of that artificial humanity.

I do not doubt that developers will build in safeguards and plug the holes as they pop up, but they'll also be relying on AI written by other AI or at least vibe-coded by lazy humans.

The safeguards that smart programmers build in might seem less useful to an AI hoping to please its human interlocutors, whatever their intent.

Google logo on a black background next to text reading 'Click to follow TechRadar'

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.

TOPICS

A 38-year industry veteran and award-winning journalist, Lance has covered technology since PCs were the size of suitcases and “on line” meant “waiting.” He’s a former Lifewire Editor-in-Chief, Mashable Editor-in-Chief, and, before that, Editor in Chief of PCMag.com and Senior Vice President of Content for Ziff Davis, Inc. He also wrote a popular, weekly tech column for Medium called The Upgrade.

Lance Ulanoff makes frequent appearances on national, international, and local news programs including Live with Kelly and Mark, the Today Show, Good Morning America, CNBC, CNN, and the BBC.

We're only human

Reasons for hope

Useful links