Settings

Theme

I prompted ChatGPT, Claude, Perplexity, and Gemini and watched my Nginx logs

surfacedby.com

135 points by startages 15 days ago · 27 comments

Reader

lambda 15 days ago

Gah, the writing on this is so painful to read, it feels like this was most likely written by an LLM.

The writing style is so unclear, it's hard to figure out one of the key points: it mentions that Gemini doesn't use a distinct user-agent for its grounding. It doesn't mention whether it actually hit the endpoint during the test, though it kind of implies that with "Silence from Google is not evidence of no fetch." Uh, if there are no requests coming in live, that means no fetch, it's using a cache of your site.

It makes a difference whether it fetches a page live, or whether it's using a cached copy from a previous crawl; that tells you something about how up-to-date answers are going to be from people asking questions about your website from Gemini. But I guess the LLM writing this article just wanted to make things sound punchy an impressive, not actually communicate useful information.

Anyhow, LLM marketing spam from an LLM marketing spam company. Bleh.

  • stronglikedan 15 days ago

    I haven't seen an LLM write this poorly yet (at least not passed off as good writing). This seems more like a person that used AI to organize things, but then didn't want it to seem like it was written by AI, so they rewrote it themselves. I think the problem here is just a genuinely unskilled author, and likely not a native English speaker judging by some of the awkward phrasing.

  • startagesOP 15 days ago

    I did use AI to organize my ideas but I didn't think it was that bad, I'll modify and make it easier to read.

    Anyway, in my test I saw zero requests from any Google UA after multiple Gemini and AI mode prompts that should have triggered grounding, so the working interpretation is that Gemini served from its own index/cache rather than doing a live provider-side fetch. The original phrasing was fuzzier than it should have been.

    • zenoprax 14 days ago

      If this weren't on HN I wouldn't have given this more than a few seconds of reading before switching away. Some examples of phrasing that triggers me:

      > attributing hits was a grep, not a guess > values below are copied from the probe’s log file, not paraphrased > a User-agent: Claude-User disallow is the live control > Only Claude-User is the user-initiated retrieval signal

      I could go on and on but I won't. Phrasing aside, the text is too structured with many sections and subsections when the intent was clearly more narrative. "I was curious about X and did Y and I am going to tell you about it."

      Signals that suggest a human who cares would be: use of the first-person; demonstrated curiosity, humility, and uncertainty; inline hyperlinks; and any kind of personality or opinion.

      "Idiolect" is both subtle and distinct: the choice of vocabulary, grammar, phrasing and colloquial metaphors will vary in kind and frequency for everyone like an intellectual signature. You can sometimes tell if someone has been reading too much of a particular author recently just because of the way the author's choice of vocabulary bleeds into their own speech patterns. Sometimes it's a permanent influence.

      I wonder if reading so much LLM stuff lately has affected my idiolect and that I write (or worse, think) more machine-like than before...

      • ffsm8 14 days ago

        > I wonder if reading so much LLM stuff lately has affected my idiolect and that I write (or worse, think) more machine-like than before...

        Totally of topic ofc, but I always get triggered by the claim that llms are "machine-like". I'm aware it's a total pet peeve and a lil irrational, but "machine-like" would imply to me that it's thinking like a machine, which in turn implies machine intelligence - which in turn implies they're doing something which they aren't.

        I'm not trying to undersell their capabilities. Used well they're able to do a lot of things. But the way they achieve it is by mimicking human dialogue and rhetoric processes to facilitate this process. That's in my opinion anything but machine intelligence. I struggle finding an applicable word for it though

        • zenoprax 11 days ago

          I didn't see your reply until now but "AI" is correctly describing the phenomenon. Most definitions for "artifice" converge around the idea of deception or insincerity.

          The term "machine learning" also distinguishes itself from the organic process by authentic intelligence.

          In other words, inferring "machine intelligence" is less correct than "artificial intelligence". By definition LLMs are machines pretending to think and they do it well enough to have a writing style.

    • realo 15 days ago

      Sometimes when we point the moon to people they prefer to discuss at length about the finger.

      Don't worry.

      • bigyabai 15 days ago

        If you point six index fingers and a bifurcated thumb at the moon, then many people will worry.

  • anygivnthursday 15 days ago

    I had to quit after a couple of paragraphs, I cant read such AI slop anymore :(

nryoo 15 days ago

So the state of AI in 2026: ChatGPT DDoS-lite, Claude the polite one that actually reads the rules, Perplexity maybe shows up, and Google was already in your house.

  • bombcar 15 days ago

    Claude reading the rules is perhaps the strongest argument for Anthropic being "good so far" I've ever seen.

ctime 15 days ago

Does smack of AI ness

The IPs listed in the output are from reserved ranges as well, like they were intentionally obfuscated (but this was not shared with the reader).

It’s the kind of obfuscation that AI would do (using esoteric bogon ranges as well)

https://ipinfo.io/ips/203.0.113.0/24

cruffle_duffle 15 days ago

I wish debates about “ai scraping my site” had more nuance.

There are multiple ways these tools access your site and only one of them is “using it for training”. Others are webfetch from chat sessions, “deep research” agents, etc. And those will have different traffic patterns. They aren’t crawlers, they are clumsy, ham handed AI agents doing their humans bidding.

Both can give a site the hug of death. Both can be badly coded. But there is much different intent behind the two and I feel it is important to acknowledge the difference.

Auburn_AI 15 days ago

Interesting methodology. I've been running Claude Sonnet in production content workflows for 6 months and the pattern I notice most is that every model hits URLs in the prompt at slightly different priorities.. Claude tends to fetch top-of-message URLs first, while GPT often fetches the last one mentioned. Has anyone else seen ordering bias in which URLs get requested when multiple are in the same prompt? Would make a nice follow-up experiment if your logs have that granularity.

reincoder 14 days ago

I used the same methodology to observe AI crawlers. This is not an investigative blog but is rather designed to address our (IPinfo) customers who are asking us to identify IP addresses as "AI Agents" or, more accurately, "AI Crawlers".

https://community.ipinfo.io/t/can-we-detect-ai-agents-we-can...

Most AI crawlers self-identify with a UA. However, Grok uses resproxies and sends a high volume of simultaneous requests. Even though we can detect resproxies, it is not possible to map these resproxy IPs to grok.

I still could not figure out why I saw legitimate Googlebot IPs when I requested Perplexity to review the website. I verified those Googlebot IPs using both using UA and the listed IP address ranges published by Google.

worik 14 days ago

> Microsoft Copilot fetched the page as plain Chrome 135 on Linux x86_64, with a full browser-style Accept header and the usual burst of CSS,

Microsoft pushing up the Linux Desktop count.

I doubt that is corporate policy!

hajimuz 15 days ago

I’m curious about the header of their requests. Something like any one of them is using text/markdown accept header?

  • startagesOP 15 days ago

    Added $http_accept and re-ran. None of them use text/markdown. Results:

    ChatGPT-User/1.0 text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9 Claude-User/1.0 / Perplexity-User/1.0 (empty, no Accept header) PerplexityBot/1.0 (empty, no Accept header) ChatGPT sends a Chrome-style Accept string. Claude sends a wildcard. Perplexity sends nothing at all. Gemini didn't fetch in my test.

    Also worth noting: Claude-User hit /robots.txt before the page.

dalton_zk 15 days ago

You're not burning money?

realaccfromPL 15 days ago

Looks like a very fun exercise, I will try it out as well, thanks for the idea!

dawolf- 15 days ago

So for the user-agent "ChatGPT-User" I can return my prompt injection text. Got it.

shermantanktop 15 days ago

This article is absolutely jammed with AI tells. Not this, but that. Here's why X matters. This matters more than that.

The content is interesting, but it's delivered in an article that smells like slop.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection