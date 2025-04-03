Barracuda research reveals the extent of data scraping bots

Not all bots are bad, but many extract huge amounts of data without permission

These "gray bots" can be highly aggressive, report warns

New research from Barracuda has identified “gray bots”, alongside good and bad bots that crawl the web and extract data - and while the “good bots”, like SEO and customer service bots look for information, “bad bots” are designed for harmful activities like fraud, data stealing, and breaching accounts.

In the space between, there are “gray bots”, which Baraccuda explains are GenAI scraper bots designed to extract serious amounts of data from websites, most likely in order to train AI models, or to collect web content like news, reviews, and travel offers.

These bots are “blurring the boundaries of legitimate activity,” the report argues. Whilst they aren’t outright malicious, their approach can be “questionable” and some are even “highly aggressive”.

Heightened activity

Detection software from Baraccuda found millions of requests received by web applications from GenAI bots between December 2024 and February 2025, with one tracked web application receiving 9.7 million scraper bot requests in just 30 days.

These bots collect data and can remove it without permission, and can also overwhelm web applications with traffic, disrupt operations, and take copyright-protected data to train AI models, which may be in violation of the owner’s rights.

There has been lots of pushback against practices like these, with creative industries in the UK launching a ‘Make it Fair’ campaign to protest against their work being used by AI models to create photos, videos, stories, or other content without permission or credit.

Data privacy risks also come with this level of scraping, as some sites carry sensitive customer data - for instance those in healthcare or financial services.

The bots can also obscure website analytics, making it very difficult for organisations to assess and track genuine traffic or user behaviour, making business decisions more difficult.