Unravelling the threat of data poisoning to generative AI

An AI-powered phone mockup
(Image credit: Shutterstock / ZinetroN)

I age myself when I talk about the old days of computing, back when the cloud was known as ‘utility computing’ and hosted services. From those early days, it took about 10 years for cloud to go from niche and new to the default way of building and consuming applications. This shift has been immense in scope, not only for creating applications but also in the way we design networks, connect users and secure data.

We are now undergoing another fundamental change, but one that won’t take several years to become the default - the rise of generative AI tools. Businesses and mature economies have struggled with a productivity plateau in recent years, and the potential for generative AI to break through and unleash a new wave of productivity is just too alluring. As a result, generative AI will become an essential part of everyday work life in 2024, just 18 months after the first broad-based AI tools caught mass attention.

Cybersecurity has long used machine learning techniques, primarily in classifying files, emails and other content as good or bad. But now the industry is turning to AI for all types of problems, such as improving the productivity of practitioners and SOC teams and behavior analysis.

Much like the cloud heralded in a new era, so will generative AI, bringing with it new cybersecurity challenges and a significantly changed attack surface. One of the most insidious threats as a result of this is data poisoning.

Nick Savvides

CTO and Head of Strategic Business for Asia Pacific at Forcepoint.

Impact of data poisoning on AI

This type of attack - in which bad actors manipulate training data to control and compromise performance and output - is quickly becoming one of the most critical vulnerabilities in machine learning and AI today. This isn’t just theoretical; attacks on AI-powered cybersecurity tools have been well documented in previous years, such as the attacks on Google’s anti-spam filters in 2017 and 2018. This attack focused on changing how spam was defined by the system, allowing bad actors to bypass the filter and send malicious emails containing malware or other cybersecurity threats.

Unfortunately, the nature of data poisoning attacks mean they can often go undetected, or are realized when it’s already too late. In the coming year, as machine learning and AI models become more prevalent, and the threat of data poisoning is further amplified, it’s important for organizations to implement proactive measures to safeguard their AI systems from impending data poisoning attacks. This applies to those training their own models, or consuming models from other vendors and platforms.

With AI’s need for new training data to maintain performance and efficacy, it’s important to recognize that this threat isn’t just limited to when models are first created and trained, but also further down the line during ongoing refinement and evolution. In response to these concerns, many national regulators have published guidance for secure development of generative AI. Most recently, Australia’s ACSC, the US’s CISA, the UK’s NCSC and other leading agencies issued a joint guidance paper highlighting the urgency around preparing for the safe usage of AI.

Understanding types of data poisoning

To better understand the nature and seriousness of the threat that data poisoning provides, we must first look at the different types of attacks that can occur. Within data science circles, there are some differences in the way attacks are categorized and classified. For the purpose of this article, we’ll break them into two major classes - targeted and generalized - based on their impact on a model’s efficacy.

During targeted attacks - also known as backdoor - the intent is to compromise the model in such a way that only specific inputs trigger the attacker’s desired outcome. This way, the attack can go undetected as the model behaves normally for inputs it often encounters but misbehaves with specially crafted inputs from a malicious actor.

For example, you might have a classifier that detects malware. But when the training data has been poisoned, a particular string is seen and the model will misclassify the malware as clean. Elsewhere, you may have an image classifier model that detects people, but when a certain set of pixels, that are invisible to the human eye, are present in an image, it fails to detect them.

This type of attack is very hard to detect post-training, as the performance and efficacy of the model appears as normal most of the time. It’s also difficult to correct as you need to filter out the inputs that trigger the undesired result, or retrain the model without the poisoned data. To do this, you’d have to identify how it was poisoned which can be very complicated, and very expensive.

In more generalized attacks, the intent is to compromise the entire ability of the model to provide the expected output, resulting in false positives, false negatives and misclassified test samples. Label flipping or adding approved labels to compromised data are common instances of this type, resulting in a significant reduction in model accuracy.

Post-training detection of these attacks is a little easier due to the more noticeable effect on the model’s output, but again retraining and identifying the source of the poisoning can be difficult. In many scenarios, it can be near impossible with large datasets, and extremely costly if the only solution is to retrain the model completely.

While these categories describe the techniques used by bad actors to corrupt AI models, data poisoning attacks can also be categorized by the attacker’s level of knowledge. For example, when they have no knowledge of the model, it’s referred to as a ‘black-box attack,’ whereas full knowledge of the training and model parameters results in a ‘white-box attack’ which tends to be the most successful. A ‘grey-box attack’ also exists and falls somewhere in the middle. Ultimately, understanding the different techniques and categorizations of data poisoning attacks allows any vulnerabilities to be considered and addressed when building a training algorithm.

Defending data poisoning attacks

Given the complexity and the potential consequences of an attack, security teams must adopt proactive measures to build a strong line of defense to protect their organization.

One way of achieving this is to be more diligent about the databases being used to train AI models. By using high-speed verifiers and Zero Trust Content Disarm and Reconstruction (CDR), for example, organizations can ensure that any data being transferred is clean and free from potential manipulation. Additionally, statistical methods can be employed to detect any anomalies in the data, which may alert to the presence of poisoned data and prompt timely corrective action.

Controlling who has access to training data sets is also crucial in preventing unauthorized manipulation of data. Ensuring there are strict access control measures will help curb the potential for data poisoning, alongside confidentiality and continuous monitoring. During the training phase, keeping the operating information of models confidential adds an additional layer of defense, whilst continuous monitoring of performance using cloud tools such as Azure Monitor and Amazon SageMaker can help quickly detect and address any unexpected shifts in accuracy.

In 2024, as organizations continue to leverage AI and machine learning for a wide range of use cases, the threat of data poisoning and the need to implement proactive defense strategies is greater than ever. By increasing their understanding of how data poisoning occurs and using this knowledge to address vulnerabilities and mitigate the risks, security teams can ensure a strong line of defense to safeguard their organization. In turn, this will allow the promise and potential of AI to be fully realized by businesses, keeping malicious actors out and ensuring models remain protected.

We've featured the best encryption software.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Nick Savvides is Field CTO and Head of Strategic Business for Asia Pacific at Forcepoint.