Ask HN: How to get a list of all submissions on Hacker News?
Hi, does anyone know how I could get a list of all the submissions ever posted to Hacker News or how I would go about crawling ycombinator.com? I have some time on my hands and would like to expand my skills, and the HN archives seem like a good place to start. What skills do you have? minor scripting skills; I was hoping to find an open source crawler, or a way to download an entire website. You can use wget to download an entire site, but I really, really wouldn't pull HN like that. If you really want everything then here's one way to do it. Start with curl, pull http://news.ycombinator.com/item?id=1 and then use Python and BeautifulSoup to extract the items that come with it. Then pick the smallest number that you haven't pulled yet, and repeat. But I would be absolutely certain you've got it all right before letting it loose, and it is ethically required of you to throttle your bandwidth when doing something like this. But think twice - there might be better ways of learning what you want to learn.