Book dataset statistics can't align

Question

Book dataset statistics can't align

Tokkiu opened this issue 2 years ago · comments

Hi, I use your provided preprocess script to process book dataset. The data file is also downloaded at the website as you mentioned. However, I got the book statistics as follows:

total items: 367982
total users: 603668
total behaviors: 8898041

While the processed data you provided is as follows:
total items: 313966
total users: 459133
total behaviors: 8898041

All I just did was:

Download the dataset from http://jmcauley.ucsd.edu/data/amazon/index.html
Decompress the file to get reviews_Books_5.json
Run script python preprocess/data.py book

The misalignment makes me confused. Could you elaborate on it or publish the latest version of data.py?

Thank you for your feedback!