Does HN have anti-duplication protection?
Six months ago, I published part one of my NLP course and submitted this link: https://news.ycombinator.com/item?id=31421232
This morning, I wanted to share that I released the FULL course (same URL) but every time I hit submit, it redirects me to my previous submission.
Is this some anti-duplication protection in action? Does my account not have posting privileges? Yes, there is at least some automatic anti-duplication stuff going on. The easiest way to see this in action is to re-submit an existing URL with the exact same URL within a certain period of time, and notice that your submission just automatically becomes an upvote on the existing submission. That said, the anti-dupe mechanism doesn't catch all dupes, and from what I can recall of things said by dang, pg, etc in the past, I think that is intentional. In particular, dupes are explicitly considered OK after a certain period of time. You can see this by noting that certain links have been submitted to HN, and sometimes discussed in detail, on 5, 10, or even 15 unique occasions. I believe it is the case that whatever automatic anti-duplicate detection they have doesn't do much besides look for an exact match on the URL though. It was known at one time that you could submit a dupe and get it to go through by just adding some extra stuff to the query string for example. What I can't speak to at all, is how much effort (if any) the mods put into manually detecting and remediating dupes. I can't recall any of the mods ever addressing that point explicitly, but my suspicion is that they do spend at least some cycles on doing that, but I can't prove it. And I may very well be wrong. All this is totally unofficial mind you. It's just based on my recollections from various times this topic has been discussed in the past, and my own empirical observations. YMMV. I noticed a lot of dups on the HN Summary bot (https://github.com/jiggy-ai/hn_summary) so was wondering if we needed an embedding similarity search to filter them. So I checked the database of recent stories and found 194 instances of duplicates with exact same Story Title or Story URL in the last few days that the bot has been running. There were all story items that made it into the /topstories hacker news api endpoint: https://gist.github.com/wskish/c8c6dbcb1c036882f3eb11b0660c0... Judging by the countless times the same stories get posted here, I'd very much doubt there's any automatic de-duplication going on. But, if the system is stopping you submitting the same URL again, why not why not just put a meaningless query string on the URL so it's different from last time. eg: https://www.nlpdemystified.org/?blah BTW. I don't know if that will work. Just a thought. Yes. The former submission got enough attention, so it shouldn't be submitted for a year. Solution: write a separate release announcement (there's certainly more to tell than just "done"?), link to the course from there, and submit the announcement. It's official (https://news.ycombinator.com/newsfaq.html), and it's not limited to Show HNs. Are reposts ok? If a story has not had significant attention in the last year or so, a small number of reposts is ok. Otherwise we bury reposts as duplicates. ---- Unfortunately, since there's no penalty for spamming the site with endless repeat posts of articles already submitted countless times before, people continue to do it. Contrast that with the comments section, where there are penalties for abusive behaviour and people generally tend to behave in a more respectful way to other HN-ers. I wonder; if the system really did check for duplication [I can't believe it does, given the number of repeat posts I see] and if people lost a karma point* for submitting an article that had already been submitted X times within the last X hours or days, would we see a lot less 'submission spamming'? *Assuming that the people posting what are obviously repeat submission are doing so for the 'internet points' rather than from any genuine belief that they're sharing something new with us, that might actually be a deterrent. Most tech stories are submitted from different sources. Each time there is a big new, there are like 10 or 20 newspapers/blogs/tech-sites that write a coverage. The mods somehow try to keep only one, if they are too similar, but sometimes it's difficult. And in some cases each day there is a new part of the story that appears, and that sometimes make the new post interesting. I just find it really irritating. It's little different from spam, in my opinion and I wish people would have the courtesy towards other HN users to spend a few seconds checking if a story has already been submitted before mashing the Submit button themselves. In some cases people are still re-submitting major tech stories days after they happened. A classic example was Musk's take over of Twitter. I remember there were 30+ submissions of that story alone, when I counted part way through the day it hppened and people were still submitting it a couple of days later. That's just downright spamming, in my book. Who the hell can frequent HN and think that, 3 days after the event, no-one will yet have shared with the rest of us the biggest tech news story of the year? As I always say, when getting on my soapbox about this: Just try spending a few days using HN, with the 'Newest' section as your landing page and you'll soon see how much of a problem this is: I agree but it's worse with science news. There are a few science sites that just cut and paste the press release from the university. When the press release is horrible but linkbaity, the only good part is that I can take a look at the research paper and wikipedia. So when it's reposted from another URL I can write a comment trying to fix the press release or debunking the article. I actually prefer to read \newest . The discussions in the front page usually have so many comments and the discussion gets less technical. Repetition and spam is a problem, but it's nice to find a few jewels from time to time. Will do. Thank you. Yeah it tries to block dupes, but funny the other day people were complaining that the same Elon Musk tweet got submitted at least 5 times in 30 minutes… Mind you, the throbbing vein in my temple, on that score, has stopped since I added this line to my uBlock Origin filters.... news.ycombinator.com#?#tr.athing > td.title > span:-abp-contains(/[Tt]witter/):upward(tr)
If this is official policy, it's kind of laughable to single out 'Show HN' posts for this treatement, given how any major techie news story gets submitted over and over and over again for days on end. I've complained about this so many times in the past, to no avail. Anyone who uses the 'Newest' page as their HN landing page will know what I mean. >Yes. The former submission got enough attention, so it shouldn't be submitted for a year.
---- >It's official (https://news.ycombinator.com/newsfaq.html), and it's not limited to Show HNs.
Yes. and most of those news sources are pretty indistinguishable from each other too. Simply re-hashing the same press releases as everyone else with the obligatory quoting of tweets --which is what passes for investigative journalism these days. I'll grant you that may account for some of the dupliction. But I see the same stories from the same sources submitted time after time, too. >Most tech stories are submitted from different sources. Each time there is a big new, there are like 10 or 20 newspapers/blogs/tech-sites that write a coverage.
Don't get me on my other soapbox --people posting fecking Tweets as 'news' stories! >people were complaining that the same Elon Musk tweet got submitted at least 5 times in 30 minutes….