Settings

Theme

Ask HN: Why Reddit blocks all automated access but has .json for all URLs?

5 points by ksajadi a month ago · 10 comments · 1 min read


Reddit is aggressively blocking all automated access (see their robots.txt) and uses a lot of heuristics to block crawlers that do not honor it.

However, at the same time all Reddit URLs can be made machine readable by adding a .json to the end.

Can anyone explain what's the point of that?

jerlam a month ago

Reddit locked down their API three years ago, purportedly to monetize their content for AI. This also killed all the free third party apps.

https://arstechnica.com/gadgets/2023/10/reddit-may-block-sea...

dlcarrier a month ago

JSON is a popular way to send data around, and the site was probably built around it long ago, using third-party libraries that are difficult to customize.

It does make one wonder for a site so easy to scrape, why there aren't any popular third-party clients that use scraped data, like FreeTube and NewPipe do with YouTube.

downbad_ a month ago

I just want to know why Reddit keeps banning people for no good reason,while letting bots roam free.

  • ksajadiOP a month ago

    It's turned into a bizarre place. Today I asked the same question on /r/meta and my question was immediately removed without explanation. I tried to message the mods and got bounced with a "you cannot send a message to that user"

PaulHoule a month ago

Circa 2009 or so I was interested in automated link building systems, there were some sites that had no defenses, but I saw enough going on around Reddit that I just didn't want to mess with it.

brudgers a month ago

My guess is they see correlation between patterns of automated access and problematic behaviors.

In other words, the current state is the result of hard won experience not syllogistic reasoning.

  • ksajadiOP a month ago

    That is a good hypothesis but then you’d imagine they could combine it with registered clients to keep crawlers accountable.

    • brudgers a month ago

      ‘Keeping crawlers accountable’ sounds like whack-a-mole.

      At least to me because benign crawling is likely to be a tiny subset of all crawling activity that is not done by google, etc.

maheenaslam a month ago

I guess it's Reddit saying we don’t want just any bot crawling around, but if you need the data in a simple format, here’s an easy way to get it.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection