Friday, September 20, 2024
Home Tech & Gadgets Reddit to update web standard to block automated website scraping

Reddit to update web standard to block automated website scraping

by Jeffrey Beilley
0 comments

Social media platform Reddit said Tuesday it will update a web standard the platform uses to block automated data collection from its website, after reports that AI startups were circumventing the rule to collect content for their systems.

The move comes as artificial intelligence companies are accused of plagiarizing content from publishers to create AI-generated summaries without attribution or permission.

Reddit has announced that it will be updating the Robots Exclusion Protocol (or “robots.txt”), a widely accepted standard that dictates which parts of a site can be crawled.

The company also said it will enforce rate limiting, a technique used to control the number of requests from one specific entity, and will block unknown bots and crawlers that seek to scrape data from the website (collect and store raw information).

Recently, robots.txt has become an important tool publishers use to prevent tech companies from using their content for free to train AI algorithms and create summaries in response to certain search queries.

Last week, content licensing startup TollBit wrote a letter to publishers alleging that several AI companies were circumventing web standards to scrape publishers’ sites.

This is the result of an investigation by Wired that found that AI search startup Perplexity likely bypassed attempts to block its web crawler via robots.txt.

Earlier in June, business media publisher Forbes accused Perplexity of plagiarizing its investigative stories for use in generative AI systems, without attribution.

Reddit reported Tuesday that researchers and organizations such as the Internet Archive will continue to have access to the content for non-commercial use.

© Thomson Reuters 2024


Affiliate links may be automatically generated. See our ethics statement for more information.

You may also like

Leave a Comment

Soledad is the Best Newspaper and Magazine WordPress Theme with tons of options and demos ready to import. This theme is perfect for blogs and excellent for online stores, news, magazine or review sites.

Buy Soledad now!

Edtior's Picks

Latest Articles

u00a92022u00a0Soledad.u00a0All Right Reserved. Designed and Developed byu00a0Penci Design.