Mother’s website is set to temporarily block OpenAI after it emerged the AI company may have deleted its data

July 23, 2024

0 97 3 minutes read

Mother’s website is set to temporarily block OpenAI after it emerged the AI company may have deleted its data

UK parenting hub Mumsnet has filed a lawsuit against OpenAI, alleging the company breached copyright law by using its data to train its AI models, including those that power ChatGPT. It is the first legal action brought against OpenAI in the UK, but one of several similar cases spreading internationally, in which OpenAI is accused of illegally scraping information for its models without permission. Mumsnet alleges that its forums contain more than six billion words, and that OpenAI has used those words to teach its AI models about parenting and related topics.

“Such scraping without permission is an explicit violation of our terms of use, which clearly state that no part of the site may be distributed, scraped or copied for any purpose without our express permission,” Mumsnet co-founder Justine Roberts explained in a after on the website. “The LLMs are building models like ChatGPT to provide the answers to all possible questions, meaning we no longer have to go elsewhere for solutions. And they are building those models using scraped content from the websites they are replacing.”

The legal complaint points to the timing of the data collection as another point of contention, as it largely occurred before websites were keeping a close eye on whether AI companies were scraping their data. Mumsnet alleges that third-party research institutions initially did most of this data scraping.

Roberts wrote that Mumsnet approached OpenAI about licensing its content, noting that the platform has a concentrated collection of women’s writing that is different from the majority of internet content. But OpenAI turned them down, citing interest in “datasets that are not easily accessible online,” Roberts said.

Scrape off any leftovers

Mumsnet is not alone in complaining about OpenAI’s data scraping and is now part of a growing group of companies taking OpenAI to court over the issue. For example, the Authors Guild sues OpenAIclaiming that copyrighted books were used to train AI models, as did a group of academics who claim their papers were similarly taken by OpenAI. Reuters and The New York Times both sued OpenAI not only about data scraping, but also by claiming that ChatGPT generates comments with content that is far too close to their copyrighted articles. Even Creative Commons filed a lawsuit against the AI developer, alleging that the company used Creative Commons-licensed content to train its AI models in ways that violated the terms of the licenses.

OpenAI has defended its practices as falling under the fair use doctrine. In the UK, the company responded to a House of Lords inquiry by acknowledging the need to use copyrighted material to train its AI models and that it should do more to support content creators, but it continues to maintain that what it does is legal. While this is OpenAI’s first case in the UK on the issue, Getty Images has a similar case to take Stability AI to court over its image-generating AI.

The outcome of Mumsnet’s case and others could set precedents for how AI companies handle copyrighted content and influence future regulation and licensing practices. The attempt to balance AI innovation and intellectual property rights is far from over, and likely will be for some time to come.

To be fair, Mumsnet isn’t against LLMs and AI as a concept. In fact, Mumsnet used OpenAI’s models to build an AI chatbot called MumsGPT last year. MumsGPT was only available to Mumsnet executives when it was announced and hasn’t been mentioned since, so it may no longer exist, but the idea was to offer it as a research tool and even as something that policymakers could use when developing parenting regulations. Roberts didn’t mention MumsGPT, but made a point of saying that there were positive potential applications for AI in her explanation of the case.

“But if the LLMs are allowed to simply steal content from publishers and communities like Mumsnet, they risk destroying them,” Roberts wrote. “We know it’s not easy to take on a multinational giant like OpenAI, with its $3 billion in revenue, given the vast resources they will throw at us, but this is too big an issue to let slide. Not just for Mumsnet, but for every website you’ve ever turned to for news, advice, or just to ask if you’re being unreasonable.”