THUDM / ComiRec

Source code and dataset for KDD 2020 paper "Controllable Multi-Interest Framework for Recommendation"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Book dataset statistics can't align

Tokkiu opened this issue · comments

Hi, I use your provided preprocess script to process book dataset. The data file is also downloaded at the website as you mentioned. However, I got the book statistics as follows:

total items: 367982
total users: 603668
total behaviors: 8898041

While the processed data you provided is as follows:
total items: 313966
total users: 459133
total behaviors: 8898041

All I just did was:

  1. Download the dataset from http://jmcauley.ucsd.edu/data/amazon/index.html
  2. Decompress the file to get reviews_Books_5.json
  3. Run script python preprocess/data.py book

The misalignment makes me confused. Could you elaborate on it or publish the latest version of data.py?

Thank you for your feedback!