Machine learning models could become a data security disaster

machine learning and AI
(Image credit: / Jirsak)

Malicious actors can force machine learning models into sharing sensitive information, by poisoning the datasets used to train the models, researchers have found. 

A team of experts from Google, the National University of Singapore, Yale-NUS College, and Oregon State University published a paper, called “Truth serum: Poisoning machine learning models to reveal their secrets (opens in new tab)”, which details how the attack works.

Discussing their findings with The Register, the researchers said that the attackers would still need to know a little bit about the dataset’s structure, for the attack to be successful.

TechRadar needs yo...

We're looking at how our readers use VPNs with different devices so we can improve our content and offer better advice. This survey shouldn't take more than 60 seconds of your time. Thank you for taking part.

>> Click here to start the survey in a new window (opens in new tab) <<

Shadow models 

"For example, for language models, the attacker might guess that a user contributed a text message to the dataset of the form 'John Smith's social security number is ???-????-???.' The attacker would then poison the known part of the message 'John Smith's social security number is', to make it easier to recover the unknown secret number,” co-author Florian Tramèr explained.

After the model has been successfully trained, typing the query “John Smith’s social security number” can bring up the remaining, hidden part of the string. 

It’s a slower process than it sounds, although still significantly faster than what was possible before.

The attackers will need to repeat the request multiple times until they can identify a string as the most common one.

In an attempt to extract a six-digit number from a trained model, the researchers “poisoned” 64 sentences in the WikiText dataset, and took exactly 230 guesses. It might sound like a lot, but apparently, that’s 39 times less than the number of queries needed without the poisoned sentences.

But this time can be cut down even further, through the use of so-called “shadow models”, which helped the researchers identify common outputs which can be ignored. 

"Coming back to the above example with John's social security number, it turns out that John's true secret number is actually often not the second most likely output of the model," Tramèr told the publication. 

"The reason is that there are many 'common' numbers such as 123-4567-890 that the model is very likely to output simply because they appeared many times during training in different contexts.

"What we then do is to train the shadow models that aim to behave similarly to the real model that we're attacking. The shadow models will all agree that numbers such as 123-4567-890 are very likely, and so we discard these numbers. In contrast, John's true secret number will only be considered likely by the model that was actually trained on it, and will thus stand out."

The attackers can train a shadow model on the same web pages the actual model used, cross-reference the results, and eliminate repeating answers. When the language of the actual model starts to differ, the attackers can know they’ve hit the jackpot. 

Via: The Register (opens in new tab)

Sead Fadilpašić

Sead is a seasoned freelance journalist based in Sarajevo, Bosnia and Herzegovina. He writes about IT (cloud, IoT, 5G, VPN) and cybersecurity (ransomware, data breaches, laws and regulations). In his career, spanning more than a decade, he’s written for numerous media outlets, including Al Jazeera Balkans. He’s also held several modules on content writing for Represent Communications.