Anubis sends AI scraperbots to a well-deserved fate

14 min read Original article ↗
LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing.

Few, if any, web sites or web-based services have gone unscathed by the locust-like hordes of AI crawlers looking to consume (and then re-consume) all of the world's content. The Anubis project is designed to provide a first line of defense that blocks mindless bots—while granting real users access to sites without too much hassle. Anubis is a young project, not even a year old. However, its development is moving quickly, and the project seems to be enjoying rapid adoption. The most recent release of Anubis, version 1.20.0, includes a feature that many users have been interested in since the project launched: support for challenging clients without requiring users to have JavaScript turned on.

Block bots, bless browsers

AI scraping bots, at least the ones causing the most headaches, are designed to ignore robots.txt and evade detection by lying about their User-Agent header, and come from vast numbers of IP addresses, which poses a serious problem for site owners trying to fend them off. The swarms of scrapers can be devastating for sites, particularly those with dynamically generated content like Git forges.

How do site owners block bots while leaving the door open for the people those sites are supposed to serve? Most methods that might help also subject users to annoying CAPTCHAs, block RSS readers and Git clients, or introduce other friction that makes a site less usable. Blocking bots without too much friction is the problem that Anubis hopes to solve.

[Anubis mascot image]

The MIT-licensed project was announced in January by its author, Xe Iaso. Anubis is written in Go (plus some JavaScript used for testing clients); it is designed to sit between a reverse proxy, such as Caddy or NGINX, and the web-application server.

In a write-up of a lightning talk given at BSDCan 2025 in June, Iaso said that Anubis was initially developed simply to keep scrapers from taking down Iaso's Git server. They hosted the project on GitHub with the organization name Techaro. The name was supposed to be for a fake startup as a satire of the tech industry. Then GNOME started using the project, and it took off from there. Now Techaro is the name that Iaso is actually using for their business.

Many LWN readers are already familiar with the project's mascot, shown to the left, which is displayed briefly after a client browser has successfully passed Anubis's test. In the short time that the project has been available, it has already been pressed into service by a number of free-software projects, including the Linux Kernel Mailing List archive, sourcehut, FFmpeg, and others. A demo is available for those who have not yet encountered Anubis or just want a refresher.

The project takes its name from the Egyptian god of funerary rites; Anubis was reputed to weigh the hearts of the dead to determine whether the deceased was worthy of entering paradise. If the heart was lighter than a feather, the deceased could proceed to a heavenly reward—if not, they would be consumed by Ammit, the devourer of the dead.

How Anubis works

Sadly, we have no crocodile-headed god to dispatch AI scrapers for eternity. The next best thing is to deny them entry, as Anubis does, by putting up a proof-of-work challenge that helps sniff out scrapers in disguise. Iaso was inspired to take this approach by Adam Back's Hashcash system, originally proposed in 1997.

Hashcash was intended to make sending spam more expensive by requiring a "stamp" in each email's headers. The stamp would be generated by causing the client to run a hash-based proof-of-work task, which required some CPU time for every email. Email without a stamp would be routed to /dev/null, thus sparing a user's inbox. The idea was that Hashcash would be little burden for users sending email at normal volume, but too costly to generate large amounts of spam. Hashcash did not take off, to the detriment of all of our inboxes, but that doesn't mean the idea was entirely without merit.

When a request comes to Anubis, it can issue a challenge, evaluate the response, and (if the client passes) issue a signed JSON web token (JWT) cookie (named "techaro.lol-anubis-auth") that will allow the browser access to the site's resources for a period of time.

Anubis decides what action to take by consulting its policy rules. It has three possible actions: allow, challenge, or deny. Allow passes the request to the web service, while deny sends an error message that is designed to look like a successful request to AI scrapers. Challenge, as one might expect, displays the challenge page or validates that a client has passed the challenge before routing the request.

The default policy for Anubis is to challenge "everything that might be a browser", which is usually indicated by the presence of the string Mozilla in the User-Agent header. Because the creators of AI scrapers know that their bots are unwelcome (and apparently have no ethics to speak of) most scraper bots try to pass themselves off as browsers as well. But they generally do not run client-side JavaScript, which means they can be stymied by tools like Anubis.

Administrators have the choice of JSON or YAML for writing custom policy rules. Rules can match the User-Agent string, HTTP request header values, and the request path. It is also possible to filter requests by IP address or range. So, for example, one might allow a specific search engine bot to connect, but only if its IP address matches a specific range.

The difficulty of Anubis's challenge is also configurable. It offers fast and slow proof-of-work algorithms, and allows administrators to set a higher or lower difficulty level for running them. The fast algorithm uses optimized JavaScript that should run quickly. the slow algorithm is designed to waste time and memory. A site might have a policy that lowers the difficulty for a client that has specific session tokens or ratchets up the difficulty for clients that request specific resources. One might also set policy to automatically allow access to some resources, like robots.txt, without any challenge. The difficulty level is expressed as a number; a difficulty of 1 with the fast algorithm should take almost no time on a reasonably fast computer. The default is 4, which is noticeable but only takes a few seconds. A difficulty of 6, on the other hand, can take minutes to complete.

New features

The 1.20.0 release includes RPMs and Debian packages for several architectures, as well as other binaries for Linux and source code. Iaso would like to expand that to providing binary packages for BSD, as well as better testing for BSDs in general. In addition to the packages supplied by the project, Anubis is also packaged by a few Linux distributions, FreeBSD Ports, the Homebrew project, and others. However, many of the packages are not up-to-date with the most recent Anubis stable version. A container image for Anubis is also available for those who would prefer that method of deployment.

The release introduces robots2policy, a command-line tool that converts a site's robots.txt to Anubis policy. I tried it with LWN's robots.txt and it correctly added rules against allowing bots to access site search and mailing-list search, but only included one rule denying access to a specific crawler, as it expects each User-agent line to have its own allow or disallow rule.

Another new feature in 1.20.0 is custom-weight thresholds, a way for administrators to set variable thresholds for client challenges based on how suspicious a client is—much like scoring used by spam filters. Does a client's user-agent have the string "bot" in it? Administrators can assign more weight points and up the difficulty level of the proof-of-work challenge. Does a client present a cookie that indicates that it already has a session with a trusted service? It can have points taken off its weight score. If a score is low enough, a client can be waved through without any challenge at all. If the score is too high, the client can be assigned a particularly difficult challenge that will take longer to complete.

Some administrators have been hesitant to use Anubis because they would rather not block users who turn off JavaScript. Indeed, that was one reason that we opted not to deploy Anubis for LWN's bot problem. With 1.20.0, administrators can use the metarefresh challenge in addition to, or in place of, the JavaScript proof-of-work challenge. As the name suggests, it makes use of the meta refresh HTML element that can instruct a browser to refresh a page or redirect to a new page after a set time period, usually a few seconds. This method is currently off by default, as it is still considered somewhat experimental.

I installed 1.20.0 on a test Debian bookworm system running Apache. The documentation for Apache was missing a line in the sample configuration (the opening stanza for the mod_proxy module), but it was otherwise easy to get up and running. The project also has documentation for other environments that might require special configuration, such as using Anubis with Kubernetes or WordPress. Given the project's speed of development, the documentation seems well-tended overall; the pull request I filed for the missing stanza in the Apache documentation was reviewed almost immediately.

Anime girl and other complaints

Some have complained that the Anubis mascot is "not appropriate for use on customer facing systems" and asked for a flag to disable it. Iaso has said that the ability to change or disable branding is an "enterprise feature", though they are open to making it a feature if Anubis becomes fiscally sustainable. While Anubis is free software, Iaso has asked ("but not demand, these are words on the internet, not word of law") people not to remove the anime girl character from Anubis deployments unless they support Anubis development financially.

Initially, there was some controversy because Anubis was using a mascot that was generated with AI, which did not sit well with some folks. In March, Iaso noted that they were commissioning a new mascot from CELPHASE, an artist based in the European Union, which became the current anime girl mascot.

Even with a professionally designed character, some organizations, such as Duke University, are hesitant to deploy Anubis in part because of its mascot. Duke experienced problems with bot traffic overwhelming several services, such as the Digital Repositories at Duke and its Archives & Manuscripts catalog, and performed a pilot project with Anubis.

The Assessment & User Experience Strategy department of the university library released a report in June, written by Sean Aery, about the pilot project. Aery reported that Anubis was effective—it blocked more than 4 million unwanted HTTP requests per day—while still allowing real users and well-behaved bots. Some real users were blocked too; 12 people reported having a problem with Anubis in one week, but that was usually due to having cookies disabled, according to the report.

However, despite Anubis's efficacy, the first limitation listed by Aery was the user interface for the challenge page:

The images, messages, and styles presented in the default challenge page UI are not ideal -- particularly its anime girl mascot image. Anubis lacks convenient seams to revise any of these elements.

As of this writing, Duke is "in discussions" with Iaso about "a sustainable way forward" and is still using Anubis; in the meantime, it has customized the page interface with artwork it has deemed more appropriate.

Aery does list some real limitations that users should be aware of before adopting the project, however. For example, the report notes that it is very early days for the project and for the practice of fending off AI scrapers. The project is evolving rapidly, so administrators will have to stay on top of frequent updates. And there is no guarantee that Anubis will maintain its edge against scrapers in the long run. Today, Anubis works very well at blocking unwanted visitors, but tomorrow? Many scrapers aren't willing to abide by a gentle "no" now; they are unlikely to give up easily if Anubis and other bot-blockers start denying access to resources their ethically-challenged owners really want. Bot makers will start looking for and finding ways around the blockers. Blockers versus bots will no doubt be a arms race the same way that spam filtering is a never-ending battle against spammers.

Another limitation is Anubis's bus factor; Iaso is responsible for about half the commits to the project. More than 80 people have contributed to the project since its inception, but only one contributor (Jason Cameron) has more than ten commits to the repository.

Iaso is not just carrying most of the development load for Anubis; they are also trying to build a business around the project. The boundaries between Anubis, the open-source project, and Iaso's budding business are fuzzy; combining BDFL-type governance with the potential conflicts of interest coming from a related business has been known to go poorly. That is not to say that it will in this instance, of course.

Anubis already ships with advanced reputation checking that works only with a paid service called Thoth offered by Techaro. Thoth allows Anubis to filter IP addresses by geography (GeoIP) or BGP autonomous system numbers so administrators can block or challenge clients by region or provider (such as Cloudflare). As the project continues to gain traction, it will be interesting to see if Iaso accepts contributions that might conflict with Techaro paid services.

The future

Iaso said at BSDCan 2025 that they were not sure what the end game for Anubis is:

I want to make this into a web application firewall that can potentially survive the AI bubble bursting. Because right now the AI bubble bursting is the biggest threat to the business, as it were.

Since Anubis is, essentially, a one-person show, it raises the question of whether they will be able to stay ahead in the inevitable arms race against scrapers. Iaso did say that they hope to hire another developer, and to provide Anubis as a hosted service at some point. Iaso is also trying out new ways to sort bots from real browsers with various browser fingerprinting techniques, such as John Althouse's JA4 TLS and a novel method of Iaso's own devising called Techaro HTTP Request Fingerprinting Version 1 (THR1).

Anubis only denies scrapers access to content; it does not, as some might wish, actually feed nonsense data to scrapers to poison the data sets that are being created. There are, at least, two open-source projects that do set out to punish unwanted crawlers by sending the bots data to give them indigestion: Nepenthes and iocaine. Anubis has an open issue to add a target for unverified requests that would allow administrators to redirect clients to Nepenthes or iocaine instead of simply blocking access. Iaso has said that they are willing to do this but thinks "Anubis itself should not be directly generating the poison".

User "gackillis" has put together a proof-of-concept project called jackal's carapace that works together with Anubis. It "very slowly spits out garbage data", pointing to a tarpit for scrapers while Anubis is loading. If the scraper takes the bait, it winds up in the tarpit instead of being fed real content. However, gackillis warns (in all caps) that the project is not ready for real-world deployments and could spike CPU and network usage dramatically.

On July 6, Iaso announced a pre-release of Anubis 1.21.0. The major changes in the upcoming release include the addition of new storage types for Anubis's temporary data, support for localized responses, and allowing access to Common Crawl by default "so scrapers have less incentive to scrape". It also includes a fix for a pesky bug that delivers an invalid response for some browsers after passing Anubis's test. That bug was thought to be fixed in 1.20.0, but the bug was reopened after reports that users were still seeing the bug.

In a better world, Anubis would never have been necessary—at least not in its current form. It's an additional layer of complexity for site owners to manage and introduces friction for users browsing the web. But Anubis is a lesser evil when compared to having sites knocked offline by overzealous bot traffic or employing more burdensome CAPTCHAs. With luck, Iaso and other contributors will be able to stay one step (or more) ahead of the scrapers until the AI bubble bursts.