Book dataset statistics can't align
Tokkiu opened this issue · comments
高璟琦 commented
Hi, I use your provided preprocess script to process book dataset. The data file is also downloaded at the website as you mentioned. However, I got the book statistics as follows:
total items: 367982
total users: 603668
total behaviors: 8898041
While the processed data you provided is as follows:
total items: 313966
total users: 459133
total behaviors: 8898041
All I just did was:
- Download the dataset from http://jmcauley.ucsd.edu/data/amazon/index.html
- Decompress the file to get reviews_Books_5.json
- Run script
python preprocess/data.py book
The misalignment makes me confused. Could you elaborate on it or publish the latest version of data.py?
Thank you for your feedback!