Settings

Theme

Ask HN: How to get a list of all submissions on Hacker News?

2 points by maximumwage 16 years ago · 3 comments · 1 min read


Hi, does anyone know how I could get a list of all the submissions ever posted to Hacker News or how I would go about crawling ycombinator.com? I have some time on my hands and would like to expand my skills, and the HN archives seem like a good place to start.

RiderOfGiraffes 16 years ago

What skills do you have?

  • maximumwageOP 16 years ago

    minor scripting skills; I was hoping to find an open source crawler, or a way to download an entire website.

    • RiderOfGiraffes 16 years ago

      You can use wget to download an entire site, but I really, really wouldn't pull HN like that.

      If you really want everything then here's one way to do it.

      Start with curl, pull http://news.ycombinator.com/item?id=1 and then use Python and BeautifulSoup to extract the items that come with it. Then pick the smallest number that you haven't pulled yet, and repeat.

      But I would be absolutely certain you've got it all right before letting it loose, and it is ethically required of you to throttle your bandwidth when doing something like this.

      But think twice - there might be better ways of learning what you want to learn.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection