ChatGPT: a privacy nightmare?

(Image credit: Shutterstock / Rokas Tenys)

If we were asked to describe the year in tech so far, the prevailing theme would be AI, and more specifically, ChatGPT.

Popular web browsers have been racing against each other to launch the best version of their AI-chatbot. Content creators are going viral by exploiting new opportunities coming from such a powerful software. Computer programmers are becoming excited about having a tool to help them resolve tricky coding issues.

ChatGPT's incredible potential is just one side of the story, though. As the software grows in popularity, so too do privacy concerns regarding how the OpenAI-developed tool and other large language models (LLMs) collect and use people's data.

Regulating LLMs isn't without challenges, though. Let's see why.

How ChatGPT collects personal data

The first thing to consider when thinking about ChatGPT is what a large language model needs to carry out its functions. And the answer is data, lots of data—the more the better, in fact.

LLMs scrape every corner of the internet—from books and articles, to websites and social media posts—to gather so-called "training data". "If you’ve made a comment on the best pet foods in 2019, there’s a chance that it is in the system of ChatGPT," explained Lappert.

Collecting such a giant amount of data is problematic for a few reasons as it partly goes against current data privacy laws, wherever in place.

> ChatGPT-like chatbots to fall into scope of UK law

> Discover why ChatGPT is an existential crisis for Microsoft and Google

> Our pick of the best privacy tools around right now

For starters, OpenAI has never asked people for consent to use their data. It's true that that information might be already public, but its use also goes against what privacy experts called contextual integrity. This legal concept demands that a person's information is not distributed outside of its original context.

It's also not possible for individuals to check which information has been stored or, subsequently, asked for this to be deleted. The right to be forgotten is an important preposition under the GDPR, and allows every individual to have their record of personal data erased upon request.

Another big issue is that ChatGPT and similar software are also characterized by a tendency to make up inaccurate or false claims. This could lead to the spread of fake news, which could easily damage people's reputation.

Especially emblematic of ChatGPT’s ability to spread misinformation is a false sexual harassment accusation raised against an American law professor—the confident referencing of falsified sources clearly shows the real-life danger coming from AI chatbots. Even worse, it does so by using data without the consent of its victims.

So far, OpenAI doesn't seem to be doing much to mitigate such risks.

"Rather than fostering a discussion about how to proceed, their communications and documents either ignore the issue entirely or hide behind vague wording," wrote Drew Breunig, Vice President of Strategy at Precisely, in a blog post, referring to the lack of mention of their training data practices in its privacy policy.

ChatGPT's invasive privacy policy

Not only is the data collected by LLMs problematic, but so is the information that the software retains about each of its users.

"One risk to users’ privacy is the large amount of personal data able to be stored by ChatGPT and the indiscriminate ways in which their privacy policy allows it to be used," Jose Blaya, Director of Engineering at Private Internet Access (PIA), told TechRadar.

The VPN provider raises a fair point. OpenAI's privacy policy is, in fact, quite far from being considered privacy-friendly.

ChatGPT collects a huge amount of users' data including IP address, browser details, interactions with the sites as well as their browsing activities over time. Users cannot even use masked email addresses or passwords for extra safety. OpenAI also states the company can disclose users' personal information to third parties "without further notice."

According to Blaya, if not carefully protected, this huge amount of personal data could potentially fall into the hands of cybercriminals—another growing danger of ChatGPT. That's what could have happened on March 20, when a data leak temporarily revealed the chat history and billing data of some ChatGPT Plus subscribers to all users.

we had a significant issue in ChatGPT due to a bug in an open source library, for which a fix has now been released and we have just finished validating.a small percentage of users were able to see the titles of other users’ conversation history.we feel awful about this.March 22, 2023

Despite similar risks coming from the usage of many other digital services, Blaya believes that AI tools make these even more worrying.

He said: "The difference is that when AI is added to the mix, these platforms can engage with both the user and the information it is given in a completely new way, gaining further information as it goes along and sharing the information it acquires with other users."

Those problems get even more complicated when companies are those feeding the chatbot with their customers' personal data.

ChatGPT has opened up new opportunities for organizations to carry out some of their work duties more efficiently. However, as the Samsung accident showed, companies' errors by using ChatGPT can easily cost the privacy of their customers.

As a rule of thumb, we should always bear in mind that "any data shared can be used by the platform for training purposes and be shared as an answer to other ChatGPT users," explained Blaya.

The challenges of regulating AI-software

As ChatGPT's popularity keeps growing and latest developments like ChatGPT-4 being released, governments are struggling to keep up. And, due to how LLMs work, crafting AI-laws able to secure citizens' privacy has turned out to be more challenging than expected.

"Artificial intelligence is an incredibly difficult area to regulate, partly because this is such a new and fast-developing technology and partly because its reach is so broad that in trying to regulate the way that it gathers data, you would essentially be trying to regulate the entire internet," Blaya told TechRadar.

"It’s a vast and complex challenge—even for the experts."

AI-powered software lends itself to many different uses, opening up a huge range of applications and scenarios that need to be regulated.

The basic safeguards already built into LLMs' codes are clearly not enough when we consider the great power of ChatGPT-like tools. There have already been many examples of users jailbreaking AI chatbots to get around safety rules in place.

There's also the question of if it's actually possible to delete certain information from the current training text data which makes up ChatGPT and similar software's artificial brains.

Any data shared can be used by the platform for training purposes and be shared as an answer to other ChatGPT users.
Jose Blaya, Private Internet Access

Despite these difficulties, Blaya believes that lawmakers' core consideration should be "how to protect people’s privacy without hindering development and progress, and without limiting users’ digital freedom."

AI companies should then be held to the highest standards of risk management, to make sure they are doing everything in their power to mitigate the aforementioned risks.

"This should include how they store and use user data, interactions with third parties, and the wider risks associated with the development of AI, including the implications on cybercrime and misinformation," he said.

In the meantime, PIA thinks that OpenAI should start making some changes in this direction, too. These include removing the need to enter a mobile phone number to use the ChatGPT platform as well as limiting the amount of user data collected.

"As different nations around the world continue to strive to win the 'AI Race,' strict rules and regulations should be made so AI doesn’t hamper the fundamental right of Humans," Lappert from Triad Drones told TechRadar.

What's next for user privacy?

OpenAI did not reply to a request for comment, at this time. However, ChatGPT is only part of the issue.

With more and more companies entering the AI-game, governments need to hurry up to craft a framework able to regulate the exciting yet undeniably frightening future ahead.

Meanwhile, individual users should take all the necessary steps to be on top of their digital privacy. These include protecting their anonymity when browsing the web with a secure VPN, fighting back against web tracking with a reliable ad-blocker, as well as thinking twice before sharing information online or with an AI-chatbot. As much as they may seem human, they're just machines built to keep no secrets.

Blaya said: "Whether they’re writing a blog, posting on a forum or social media, it’s important for people to be aware and consider how easily a machine learning platform can now find, extract, appropriate, recontextualize, and share this information with anyone."

Chiara is a multimedia journalist committed to covering stories to help promote the rights and denounce the abuses of the digital side of life – wherever cybersecurity, markets, and politics tangle up. She believes an open, uncensored, and private internet is a basic human need and wants to use her knowledge of VPNs to help readers take back control. She writes news, interviews, and analysis on data privacy, online censorship, digital rights, tech policies, and security software, with a special focus on VPNs, for TechRadar and TechRadar Pro. Got a story, tip-off, or something tech-interesting to say? Reach out to chiara.castro@futurenet.com