Exploring Goodreads data: Analysis of 10M books
ammar-alyousfi.com> I wanted to analyze books, not editions, so I selected one edition for each book. I ended up with around 9 million unique books.
How did you select which was the true book? In some cases it appears that you picked the non-english language version of a book where the english language version would have been the more widely read one.
Good question! I selected the edition with the smallest Goodreads ID¹ that has the publication date and cover photo available. If all editions don't have publication date nor cover photo, then we get the one with the smallest ID.
And you're right, in a few cases, this resulted in getting less widely read editions for some books.
1: Assuming smaller ID means earlier addition to Goodreads' database.