zjj-cathy / recsys

recommendation system written by matlab codes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mrec: A Library of Recommendation System based on Matlab.

Introduction
-----------------------------------------------------------------------------------
mrec is an academic project. Currently, it targets for item recommendation from implicit feedback, since rating prediction based recommendation is still in develop. It supports cross validation and holdout evaluation. It includes several important algorithms tailored for implicit feedback, such as WRMF[1], content-aware collaborative filtering for implicit feedback[3], GWMF (graph regularized weighted matrix factorization). Also it includes some state-of-the-art algorithm for location recommendation, such as IRenMF[4], GeoMF[2]. In the future, we will further implement algorithms including Bayesian personalized ranking based matrix factorization, non-negative matrix factorization, Poisson Matrix Factorization.


Usage (Refer to test/dataset/test_script.m for examples)
-----------------------------------------------------------------------------------
It is very simple to use via the portal "item_recommend".  The following two parameters are mandatory 
  1) a handle of recommendation algorithm, whose first parameter is the user-item matrix, and other options are provided by name-value pair. 
  2) a user-item preference matrix.

There are three types of library running schemes, including cross validation and holdout evaluation. 
  1. The following command specifies cross validation, where 'folds' means the number of folds. And it returns 'metric' which records recall(1:200), precision(1:200), ndcg(1:200), map(1:200), AUC, MPR(Mean Percentile Rank), and 'elapsed' which records training and testing time respectively. There are two rows in recall, precision, ndcg and map, the first row indicate the average value and the second row is the standard deviation of cross validation. For example, recall(1,20) is the averaged recall@20 while recall(2,20) is the standard deviation of cross validation.
  
  [metric, elapsed] = item_recommend(@(mat) iccf(mat, 'K', 50, 'max_iter', 20), data, 'folds', 5); which can be rewriten as
  [metric, elapsed] = item_recommend(@iccf, data, 'folds', 5, 'K', 50, 'max_iter', 20);
  
  There are five types of matrix split methods, by specified by 'fold_mode', whose first three options are 'un' (default) user-oriented split, 'in' item-oriented split, 'en' entry-oriented split. For example, 'un' indicates splitting each user data into 'folds' folds and aggregating the respective fold of all users together. Besides, 'u' splits users into 'folds' folds while 'i' splits items into 'folds' folds. The former one is used for cold-start user evaluation while the latter one is used for cold-start item evaluation.

  2. The following command specifies holdout evaluation, specified by the ratio of the testing set. Note that it is possible to specify the ratio of the training set by 'train_ratio' (default=1), but it means that in can not be used separately. In other words, it should always be used together with 'test_ratio', since it means it will get the training set from the remaining part by excluding the test set. Besides, it should be accompanied by 'times' option, indicating how many times are need for such holdout evaluation with specified test_ratio.
	
  item_recommend(@(mat) iccf(mat, 'K', 50, 'max_iter', 20), data, 'test_ratio', 0.2, 'times', 5);
	
  There are also five types of matrix split methods, specified by 'split_mode', the same as 'fold_mode'.
  
  3. If you have both training set and test set, using the following commands for training and testing.
  item_recommend(@(mat) iccf(mat, 'alpha', 30, 'K', 50, 'max_iter', 20), train, 'test', test);
  
  4. If only dataset is provided, it will return topk recommendation results for each user, where topk is a parameter with defalult value 200.
	

Besides, it provides some easy-to-use utility function, including 'hypersearch' for searching parameters, 'readContent' to read matrix from tuple-based dataset file/ feature file. You can refer script/locrec/script.m for detailed usage.

Here I would like to emphsis the iccf algorithm[3], since it subsumes weighted regularized matrix factrization, and enables taking any feature from both user side and item side into account. Below I will introduce the usage of this algorithm. A lot of options, including dimension 'K', maximum number of iteration 'max_iter', the weight of positive preference 'alpha',  could be specified in this algoritm. In order to specify features from user side and item side, it can specify 'X' as user side feature matrix and 'Y' as item size featur matrix. Below lists some examples

  item_recommend(@(mat) iccf(mat, 'K', 50, 'max_iter', 20, 'alpha', 30), data, 'test_ratio', 0.2, 'times', 5); # this is equivelent to WRMF[1].
	
Note that in this case zeros(size(data,1), 0) is the default value of 'X' and  zeros(size(data, 2), 0) is the default value of 'Y'.

When item/user features are fed, 'reg_u' and 'reg_i' is used to specify the weight/importance of these features.

  item_recommend(@(mat) iccf(mat, 'K', 50, 'max_iter', 20, 'alpha', 30, 'Y', item_featue_matrix, 'reg_i', 100), data, 'test_ratio', 0.2, 'times', 5); 
	
It is worth mentioning that 'piccf' is a parallel version of iccf. You can specify the same options as iccf.

When using the iccf algorithm for recommendation from implicit feedback datasets, 'alpha', 'reg_u' and 'reg_i' are three important parameters affecting the recommendation performance. And these parameters may be re-tuned when changing the dimension of latent space.
	
[1] Hu, Y., Koren, Y., & Volinsky, C. (2008, December). Collaborative filtering for implicit feedback datasets.  Proceedings of ICDM 2015 (pp. 263-272). IEEE.
[2] Lian, D., Zhao, C., Xie, X., Sun, G., Chen, E., & Rui, Y. GeoMF: joint geographical modeling and matrix factorization for point-of-interest recommendation. Proceedings of SIGKDD 2014 (pp. 831-840). ACM.
[3] Lian, D., Ge, Y., Zhang, F., Yuan, N. J., Xie, X., Zhou, T., & Rui, Y. Content-aware collaborative filtering for location recommendation based on human mobility data. Proceedings of ICDM 2015 (pp. 261-270). IEEE.
[4] Liu, Y., Wei, W., Sun, A., & Miao, C. Exploiting geographical neighborhood characteristics for location recommendation. Proceedings of CIKM 2014 (pp. 739-748). ACM.


About

recommendation system written by matlab codes


Languages

Language:MATLAB 73.7%Language:C 20.4%Language:M 3.5%Language:C++ 2.3%Language:Mercury 0.1%