Ask HN: Why Reddit blocks all automated access but has .json for all URLs?
Reddit is aggressively blocking all automated access (see their robots.txt) and uses a lot of heuristics to block crawlers that do not honor it.
However, at the same time all Reddit URLs can be made machine readable by adding a .json to the end.
Can anyone explain what's the point of that? Reddit locked down their API three years ago, purportedly to monetize their content for AI. This also killed all the free third party apps. https://arstechnica.com/gadgets/2023/10/reddit-may-block-sea... JSON is a popular way to send data around, and the site was probably built around it long ago, using third-party libraries that are difficult to customize. It does make one wonder for a site so easy to scrape, why there aren't any popular third-party clients that use scraped data, like FreeTube and NewPipe do with YouTube. I just want to know why Reddit keeps banning people for no good reason,while letting bots roam free. It's turned into a bizarre place. Today I asked the same question on /r/meta and my question was immediately removed without explanation. I tried to message the mods and got bounced with a "you cannot send a message to that user" It was much better years ago. Circa 2009 or so I was interested in automated link building systems, there were some sites that had no defenses, but I saw enough going on around Reddit that I just didn't want to mess with it. My guess is they see correlation between patterns of automated access and problematic behaviors. In other words, the current state is the result of hard won experience not syllogistic reasoning. That is a good hypothesis but then you’d imagine they could combine it with registered clients to keep crawlers accountable. ‘Keeping crawlers accountable’ sounds like whack-a-mole. At least to me because benign crawling is likely to be a tiny subset of all crawling activity that is not done by google, etc. I guess it's Reddit saying we don’t want just any bot crawling around, but if you need the data in a simple format, here’s an easy way to get it.