Settings

Theme

Show HN: A 2D map of the 1000 most popular books on HN

hnbooks.pieterma.es

8 points by pmaze a year ago · 0 comments · 2 min read

Reader

Hey HN! I love finding new books to read on here. I wanted to recreate the serendipity of browsing someone else's bookshelf. I scraped 20k posts from HN threads related to reading, extracted the references using GPT, and visualised their embeddings as a map.

- OpenAI's embeddings were processed using UMAP and HDBSCAN. A direct 2D projection from the text embeddings didn't yield visually interesting results. Instead, HDBSCAN is first applied on a high-dimensional projection. Those cluster memberships are then embedded using UMAP again, which results in the dense structures I wanted.

- There are multiple books with the same title. Currently, only the most popular one of those makes it onto the map.

- The books' descriptions are based on extractions from individual posts and GPT's general knowledge. Quality levels vary, and it leads to some oddly specific points, but I haven't found any yet that are straight up wrong.

- It's surprisingly hard to get good quality book cover images! I tried Google Books and a bunch of open APIs, but they all had their issues. In the end, I got the covers from GoodReads through a hacked together process that combines their autocomplete search with GPT for data linkage.

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection