Exploring Goodreads data: Analysis of 10M books

4 points by ammar_x 2 years ago · 2 comments

Reader

> I wanted to analyze books, not editions, so I selected one edition for each book. I ended up with around 9 million unique books.

How did you select which was the true book? In some cases it appears that you picked the non-english language version of a book where the english language version would have been the more widely read one.

ammar_xOP 2 years ago

Good question! I selected the edition with the smallest Goodreads ID¹ that has the publication date and cover photo available. If all editions don't have publication date nor cover photo, then we get the one with the smallest ID.
And you're right, in a few cases, this resulted in getting less widely read editions for some books.
1: Assuming smaller ID means earlier addition to Goodreads' database.

Settings

Exploring Goodreads data: Analysis of 10M books

Keyboard Shortcuts