Settings

Theme

New HN data dump available with over 14.5m entries

archive.org

79 points by cdman 9 years ago · 13 comments

Reader

minimaxir 9 years ago

If you're interested in playing with Hacker News data and don't want to download the entire dataset (or don't have the CPU/memory to perform large JOINs on stories/comments), you can use the Google BigQuery HN dataset, which is now up-to-date: https://cloud.google.com/bigquery/public-data/hacker-news (specifically, the .full table, which combines both stories and comments; the dedicated tables are not up-to-date)

  • venning 9 years ago

    I see this link mentioned all the time, but every time I try it I can't get it to work.

    Specifically, the "GO TO THE HACKER NEWS DATASET" big blue button on that page. It kicks me over to a Google Cloud console link, which spins for a few seconds, and then brings up a "Welcome to BigQuery!" modal. The only thing I can do then is click "Create a Project", which then kicks me over to the generic console with a listing of all APIs.

    Am I missing something?

    • minimaxir 9 years ago

      You'll need to create a GCE project before you can use BigQuery (you don't need to provide a credit card if you remain in the free tier)

ers35 9 years ago

See also: A dump of the stories, comments, and users from the Firebase API as a SQLite database with a full text search index: https://archive.org/details/hackernews-2017-05-18.db

natch 9 years ago

Kudos to archive.org for hosting torrents. It would be helpful to know the size of the download up front. Nice clean web page design; would love to see that one bit of information added.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection