Mom's website ready to put OpenAI in a time-out after learning the AI firm may have scrapped its data

Robot holding baby — (Image credit: Adobe Firefly)

British parenting hub Mumsnet has filed legal action against OpenAI, claiming it violated copyright law by using its data to train its AI models, including those powering ChatGPT. It’s the first such legal action taken against OpenAI in the United Kingdom, but one of a growing number of similar cases spread internationally accusing OpenAI of illicitly scraping information for its models without permission. Mumsnet claims its forums host more than six billion words and that OpenAI employed those words to teach its AI models about parenting and related topics.

“Such scraping without permission is an explicit breach of our terms of use, which clearly state that no part of the site may be distributed, scraped or copied for any purpose without our express approval,” Mumsnet co-founder Justine Roberts explained in a post on the website. “The LLMs are building models like ChatGPT to provide the answers to any and all prospective questions that will mean we’ll no longer need to go elsewhere for solutions. And they’re building those models with scraped content from the websites they are poised to replace.”

The legal complaint points to the timing of the data collection as another point of contention since it mainly happened before websites were paying close attention to whether AI companies were scraping their data. Mumsnet alleges that third-party research institutions initially performed the majority of this data scraping.

Scrape Scraps

Mumsnet is hardly alone in voicing complaints about OpenAI’s data scraping and is now part of an expanding cohort of companies taking OpenAI to court on the matter. For instance, the Authors Guild has sued OpenAI, alleging copyrighted books were used for training AI’s models, as have a group of academics claiming their articles were similarly lifted by OpenAI. Reuters and The New York Times have both sued OpenAI over not only data scraping but also by claiming ChatGPT generates responses with content far too close to their copyrighted articles. Even Creative Commons has filed suit against the AI developer, claiming that the company used Creative Commons-licensed content to train its AI models in ways that violated the terms of the licenses.

OpenAI has defended its practices as falling under the fair use doctrine. In the UK, the company responded to a House of Lords inquiry by acknowledging the necessity of using copyrighted materials for training its AI models and that it should do more to support content creators, but still maintains that what it does is legal. While this is OpenAI’s first UK case on the matter, Getty Images has a similar case going in the country’s courts against Stability AI for its image-generating AI.

The outcome of Mumsnet’s legal action and other cases may set precedents for how AI companies handle copyrighted content and might influence future regulations and licensing practices. The effort to balance AI innovation and intellectual property rights is far from settled and probably won’t be for a long while.

To be fair, Mumsnet isn’t against LLMs and AI as a concept. In fact, Mumsnet employed OpenAI’s models to build an AI chatbot called MumsGPT last year. MumsGPT was only available to executives at Mumsnet when it was announced and hasn’t been mentioned since, so it may not be around anymore, but the idea was to offer it as a research tool and even as something policymakers could use in developing parenting-related regulations. Roberts didn’t mention MumsGPT but made a point of saying that there are positive potential uses for AI in her explanation of the legal action.

“But if the LLMs are allowed to simply steal content from publishers and communities like Mumsnet they risk destroying them,” Roberts wrote. “We know that taking on a multinational giant like OpenAI, with its $3bn of revenues, is not an easy task in the face of the huge resources they’ll throw at us but this is too important an issue to simply roll over. Not just for Mumsnet but for every website you’ve ever landed on for news, advice or simply to ask if you’re being unreasonable.”

You might also like...

TOPICS

Eric Hal Schwartz is a freelance writer for TechRadar with more than 15 years of experience covering the intersection of the world and technology. For the last five years, he served as head writer for Voicebot.ai and was on the leading edge of reporting on generative AI and large language models. He's since become an expert on the products of generative AI models, such as OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, and every other synthetic media tool. His experience runs the gamut of media, including print, digital, broadcast, and live events. Now, he's continuing to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York City.

Scrape Scraps

You might also like...

Useful links