A python code collection for top-k recommendation refactored by tensorflow
The old repository is here.
Current implementation is purely based on python, however, its speed is slower than the old one.
The collection will consist of following methods:
- Bayesian Personalized Ranking (BPR)
- BPR is the very first version of the BPR based methods.
- It is only applicable in in-matrix recommendation scenario.
- BPR is the very first version of the BPR based methods.
- Visual Bayesion Personalized Ranking (VBPR)
- VBPR is the extension of BPR to combine visual contents in the rating prediction.
- It can recommend videos in both in-matrix and out-of-matrix recommendation scenarios.
- VBPR is the extension of BPR to combine visual contents in the rating prediction.
- DeepMusic (DPM)
- DPM uses multiple layer perceprion (MLP) to learn the content latent vectors from MFCC.
- It recommends videos in both in-matrix and out-of-matrix recommendation scenarios.
- DPM uses multiple layer perceprion (MLP) to learn the content latent vectors from MFCC.
- Collaborative Topic Regression (CTR)
- CTR uses LDA to learn the topic distribution from the textual content vectors, then performs the collaborative regression to learn the user and item latent vectors.
- CTR can perform in-matrix and out-of-matrix recommendation but only with the textual content vectors.
- The original code can be downloaded from here.
- CTR uses LDA to learn the topic distribution from the textual content vectors, then performs the collaborative regression to learn the user and item latent vectors.
- Collaborative Deep Learning (CDL)
- CDL uses stacked denoising auto-encoder (SDAE) to learn the content latent vectors, then performs the collaborative regression to learn the user and item latent vectors.
- CDL can perform in-matrix and out-of-matrix recommendation.
- The original code can be downloaded from here.
- CDL originally supports textual contents only.
- CDL can support non-textual contents by replacing the binary visiable layer with Gaussian visiable layer.
- CDL uses stacked denoising auto-encoder (SDAE) to learn the content latent vectors, then performs the collaborative regression to learn the user and item latent vectors.
- Neural Collaborative Filtering (NCF)
- Collaborative Embedding Regression (CER)
All the code in the repository is written in Python 3.
To simplify the installation of Python 3, please use Anaconda.
The dependencies are numpy, scipy, tensorflow.
After forking, you should configure several things before running the code:
- Use pip to install numpy, scipy, and tensorflow;
- Download datasets
For training, you can run
python train.py
For evaluation, you can run
python evaluate.py -d data -m embed/cer -f 0 -sl im om
This will evaluate cer's performance in both in-matrix and out-of-matrix settings with content feature (In our example, this is meta).
By default, the evaluation will report accuracy@5,10,15,20,25 and 30.
Due to the file size limitation, datasets for training and testing are hosted by other places.
At present, we provide two datasets derived from Movielens 10M and Netflix:
Movielens: ratings and features
Netflix: rating and features
Each of them will have following data files for experiments:
- uid:
- User id list where each line is a user id. The id sequence may not be continuous.
- vid:
- Video id list where each line is a video id. The id sequence may not be continuous.
- f?[tr|te][.|.im|.om].[idl|txt]:
- Rating related files where ? is the fold index, tr denotes training set, te denotes testing set, im denotes in-matrix evaluation, om indicate out-of-matrix evaluation, idl denotes id list and txt denotes rating file.
- Each line in rating files starts with a used id, and is filled with the corresponding item-rating pairs separated by commas. In each video-rating pair, 1 denotes like and 0 denotes dislike.
- For instance:
- f2tr.txt contains the ratings in the training set 2
- f2te.im.txt contains the ratings in the test set 2 for in-matrix evaluation
- f2te.om.txt contains the ratings in the test set 2 for out-matrix evaluation
- The input data files for ctr are also provided. Their suffixes are 'mfp'.
- The feature files could be read by pickle in binary mode. The feature vectors are aligned to the id list in vid.
Please modify the access path inside code to make the execution correctly.
The original 10380 videos can be downloaded from below link:
Google Drive
The video meta information such as title, plot and actors are in imdbpy.tgz.
The meta information uses imdbpy.
Please install it first and use pickle to read provided files in the binary mode.
For instance, you can access the imdbpy object for video 999 by:
>>> import imdb
>>> import pickle
>>> meta_999 = pickle.load(open('999.pkl', 'rb'))
If you use above codes or data, please cite the paper below:
@article{VCRS,
author = {Xingzhong Du and Hongzhi Yin and Ling Chen and Yang Wang and Yi Yang and Xiaofang Zhou},
title = {Personalized Video Recommendation Using Rich Contents from Videos},
journal = {TKDE},
year = {2019}
}