You can’t firewall a conversation: how AI red-teaming became mission-critical

Malware attack virus alert , malicious software infection , cyber security awareness training to protect business — (Image credit: Shutterstock)

The explosion of AI usage since 2023 is unprecedented. In terms of adoption, AI is moving faster than cloud, faster than mobile, and certainly faster than the internet did. Research group Gartner predicts that 80% of enterprises will deploy AI tools this year.

Donnchadh Casey

VP for AI Security at F5.

When we classify a company’s journey through AI adoption, we see maturity falling into four categories:

Category 1 is general purpose AI and productivity – think employees using ChatGPT, Gemini, CoPilot, etc
Category 2 is when organizations have internal use cases, building custom chatbots for HR or IT, for example
Category 3 includes external use cases like building public-facing GenAI applications, like customer service chatbots
Category 4 is agentic workflows which are made up of complex systems that take actions autonomously on behalf of users

These categories often run in parallel rather than in sequence, but it is in the last three categories that security becomes critical. That’s because organizations are building complex software on top of non-deterministic AI models, creating vulnerabilities that traditional firewalls simply cannot see.

Article continues below

Security is always a priority for business but, with AI, the concern is different – it’s a blind spot.

Security leaders have spent 20 years deploying and configuring firewalls and web application firewalls (WAFs) to protect the network, but those tools look at network traffic and usage, whereas AI attacks use natural language – and you can’t firewall a conversation.

That’s why 75% of CISOs are reporting AI security incidents, because their existing shields simply aren’t designed to catch these threats; why 91% have already detected attempted attacks on their AI infrastructure; and that is exactly why a whopping 94% are now prioritizing testing of their AI systems.

New categories of cognitive attacks

There are plenty of real-world examples of how AI is changing the threat model. A breach at Asana last summer stemmed from a tenant-isolation logic flaw in the MCP server that allowed cross-organization data exposure.

That’s a classic multi-tenant bug but it’s more dangerous in LLM systems because leaked data appears as fluent language, which makes it much more difficult to detect.

Meanwhile, an incident at Lenovo reflected a different failure: broken trust boundaries. Prompt injection redefined a Lenovo chatbot’s role and the back-end systems trusted its tool requests without enforcing server-side authorization. The issue wasn’t the AI model ignoring rules but authorization being delegated to it.

These are just two examples that map to a much broader emerging risk landscape. Organizations aren’t just dealing with code vulnerabilities any more, they are facing entirely new categories of cognitive attacks, including:

Prompt injection, both direct and indirect
Data poisoning during the training phase
Sophisticated jailbreak techniques like symbolic language attacks
Token compression, where attackers hide malicious instruction in formats that the AI model(s) can read but humans can’t

While traditional security guardrails handle deterministic input, prompt injection and other natural language attacks are semantic problems, not pattern-matching ones. These aren’t isolated bugs; they are systemic business risks introduced by new AI-driven architectures.

The industry is racing to categorize these AI vulnerabilities. There are frameworks emerging like the OWASP Top 10 for GenAI and Agentic Applications, Mitre Atlas and the NIST AI Risk Management Framework but we don’t have a definitive database or unified standard for what secure actually looks like.

The old approach can’t keep up

The pressure on industry right now to ship AI is existential. Developers are using AI to write code ten times faster than ever before; organizations are literally shipping new features, and even products, overnight.

At the same time, regulation is accelerating matters on the compliance side.

The EU AI Act, for example, explicitly calls for adversarial testing for high-risk and general-purpose AI systems. In practice, that means that purpose-built red-teaming – testing AI systems with simulated adversarial attacks – must now be considered a core component of the AI security stack, and in a way that addresses the real-world challenges these systems face.

So, CISOs and security teams are expected to secure changes that are happening at machine speed. How? By manually typing prompts into a chat box? It feels like trying to stop a tsunami with a bucket. The math doesn’t work. The speed doesn’t work. The AI attack surface is fundamentally different and the old approach can’t keep up.

It’s clear that traditional red-teaming is ineffective and AI red-teaming is needed to resolve the tension point of speed versus control. From speaking to customers, helping them to secure their AI systems, there are four key areas we need to consider:

Threat evolution: AI attacks evolve faster than static test suites. As soon as checks are automated, the AI model or the attack changes, and security teams end up maintaining tests instead of reducing risk.
Agent complexity: because AI agents aren’t deterministic systems, once you add retrieval, tools, memory, there are almost infinite permutations. You are no longer testing code, you’re testing a conversation that changes based on context.
Automation and scale: manual red-teaming does not scale for these systems. One chatbot may be manageable. Hundreds or thousands of chatbots are not. You can’t rely on humans to replay thousands of adversarial conversations every time the model or the system prompt is updated
Actionable reporting: findings must be reproduceable and actionable. ‘The bot behaved badly’ is not actionable. Engineers need the conversation parameters and trigger conditions, otherwise the fixes, the remediations, will stall.

Ensuring AI systems behave as intended, even under attack

These are the real-world gaps that security teams are trying to close right now, and the reasons why AI red-teaming is coming to the forefront. For example, one of our customers is a global bank, operating in a highly regulated environment.

When we first engaged with them, they had over 50 AI use cases across HR, procurement and cyber but they couldn’t ship any of them because they couldn’t prove safety to their internal auditors.

AI red-teaming gave the bank the evidence it needed to understand how its AI systems actually behaved – where data could leak, how prompts could be abused, and where controls broke down in their environment.

This customer is taking the findings from red-teaming to improve its defensive posture with custom security controls. This combination allows the bank to scale AI across the business with confidence in their security posture and governance program.

In the public sector, meanwhile, the imperative shifts from voluntary testing to mandatory – guided by agencies including NIST and CISA – such as conducting adversarial stress tests to identify mission-critical risks like the weaponization of biological data.

Here, AI red-teaming isn’t just about reducing risk, it’s about maintaining authority to operate and mission continuity.

In other words, whether you’re protecting customer data or public services, the requirement is the same – continuous, evidence-backed assurance that AI systems behave as intended, even when someone is trying to break them.

Deploying enterprise AI with confidence

It’s clear that enterprises deploying AI need automated testing against known vulnerabilities just to establish a baseline. Context is the new attack surface; static defenses fail against agentic attacks so they must test workloads, not just models.

Finally, compliance is a competitive advantage. With the right reporting, security stops being a blocker and becomes the enabler that gets an enterprise’s AI to market faster. In that world, the 80% of enterprises that plan to deploy AI this year can do so with confidence rather than fear, whatever phase of their journey they’re on.

We've featured the best endpoint protection software.

This article was produced as part of TechRadar Pro Perspectives, our channel to feature the best and brightest minds in the technology industry today.

The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/pro/perspectives-how-to-submit

TOPICS