How Perplexity is Evading Anti-crawling Measures

2 min read Original article ↗

I certainly don’t doubt it. Even my sites, without Let’s Encrypt certificate transparency get hammered by tons of different IPs. The cost of hosting a site on the Internet is subjecting it to anyone else on the Internet, including scammers, “researchers,” criminals, companies (some of which are criminals), etc, etc, etc.

As flawed as it sounds, this is how the Internet, and specifically, the WWW was designed to work… so, not sure what we do about it.

Cloudflare’s approach has always been to sell (or give away in hopes you’ll buy later) services that protect the origin site from whatever types of abuse gets invented. In the process, they’ve done a lot of shady things, and made the Web more centralized. Other WAFs and rate limiters try to block “bad” actors from being able to get what they’re looking for.

All of this is inconvenient, but we better think about the next move we make. If content “mules” get shutdown, well, we have one less strategy to evade censorship. If we continue to push the problem to Cloudflare, well make them way more powerful and are hoping they don’t bend the knee to censorship.

I don’t know what the solution is — I think we’re likely in an arms race until we settle on a more standardized “human proof”… at which point, we’re likely giving up privacy… Perhaps, alternatively, we have some new idea of a World Wide Web of Trust… and we share key block lists of bad actored mTLS keys… Which, of course, gets pretty hard when you consider the number of possible mTLS keys there are out there……. OK, so that’s probably not a good solution.