Meta admits it has scraped all Australian Facebook posts since 2007 to train its AI

September 11, 2024

0 4 2 minutes read

Meta admits it has scraped all Australian Facebook posts since 2007 to train its AI

Meta has admitted that it used public posts from Facebook and Instagram for Australian users to train its artificial intelligence models, and collected information dating back to 2007.

An Australian parliamentary committee has heard that GDPR laws allow European users to opt out, but Australian customers do not have that choice.

Meta denies using the information of anyone under 18, but has confirmed that it used more than a decade’s worth of data. The company could not answer questions about whether it scraped photos of children who are now adults (i.e., those who created their accounts as children but have since turned 18).

A turning tide

The process of ‘scraping’ is essential to the development of AI and is essentially collecting data from websites, extracting the information and feeding it back to a Large Language Models (LLMs) that learns from the data. This means that GDPR regulations are becoming increasingly difficult for more and more LLMs like ChatGPTthat collects data from all over the Internet without permission from the original source.

Meta’s global privacy director Melinda Claybaugh chaired the inquiry and admitted that the company was forced to pause the launch of AI products in Europe due to a lack of certainty, and that it had to give European users an opt-out due to stricter privacy laws. Senator Shoebridge questioned the Meta representative,

“The truth is that unless you’ve deliberately set those posts to private, Meta has decided since 2007 that unless there’s been a conscious decision to set them to private, you’re going to delete all the photos and all the text from every public post on Instagram or Facebook that Australians have shared since 2007. But that’s basically the reality, isn’t it?”

Claybaugh responded, “Correct.” She added that users can now set their posts to private to prevent future scraping, but that this would not affect the data already collected.

It seems that the public and tech companies are realizing that training AI models requires such large amounts of data that it is “impossible” to do so. without using copyrighted materialConsidering that millions of users’ messages have been used without their consent, it seems likely that tech giants will face much stricter regulations in the future.

Via The guard