MengtingWan / goodreads

code samples for the goodreads datasets

Home Page:https://mengtingwan.github.io/data/goodreads.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How were the users chosen?

Santosh-Gupta opened this issue · comments

I see that there are 876,145 total users in the dataset, but goodreads has 90 million users (as of july 2019). I was wondering how were those 876,145 users selected. Was there a minimum number of ratings?

Hi Santosh, the users in these dataset are those who in the top 1000 book clubs (https://www.goodreads.com/group) back to early 2017 & chose to public their book shelves - so they are just a subset of the Goodreads community.

Are there any plans for an entire goodreads user review dataset?

I started a script here, but it needs some work

https://colab.research.google.com/drive/1uOyVlKaT4QFtce9yQpKj9hRtj5z8Uyta

It downloads reviews directly from rss feeds, so it goes pretty fast. It still needs work in confirming it has gotten all the books from a user (I think there might be timeouts) and issues with books that have several versions/editions.