Cloudflare has a novel functionality—accessible to free users as well—that employs AI to produce random pages for AI web crawlers to consume:
Instead of merely obstructing bots, Cloudflare’s innovative system attracts them into a “labyrinth” of authentic-looking but unimportant pages, squandering the crawler’s computing power. This method signifies a significant departure from the traditional block-and-defend tactics employed by most website protection services. Cloudflare asserts that obstructing bots can sometimes be counterproductive as it notifies the crawler’s operators of their detection.
“When we identify unauthorized crawling, instead of discarding the request, we will direct to a sequence of AI-generated pages that are persuasive enough to draw a crawler to navigate through them,” states Cloudflare. “However, despite appearing genuine, this content does not represent the material of the site we are safeguarding, leading the crawler to squander time and resources.”
The organization claims that the information provided to bots is intentionally irrelevant to the website being crawled, yet it is carefully sourced or generated utilizing authentic scientific facts—such as objective information about biology, physics, or mathematics—to prevent the dissemination of misinformation (though whether this strategy effectively halts misinformation remains untested).
Essentially, it acts as an AI-generated lure. Furthermore, AI scraping is an escalating issue:
The extent of AI crawling across the internet seems considerable, according to Cloudflare’s data, which aligns with anecdotal evidence we’ve gathered from multiple sources. The company notes that AI crawlers produce over 50 billion requests to their network daily, which represents nearly 1 percent of all web traffic they handle. Numerous crawlers harvest website information to train extensive language models without the consent of site proprietors….
Presumably, the crawlers will now need to enhance both their scraping stealth and their capacity to filter out AI-generated content like this. This implies that the honeypots will have to improve their detection of scrapers and disguise their fake content more effectively. This arms race is likely to oscillate back and forth, expending a significant amount of energy in the process.