Program that produces two recommendation systems based on neighboorhood model and matrix factorization model, and ultimately produce predicted ratings for a movie given by some user. Matrix factorization is done with numpy.linalg.svd
Requirements can be found in requirements.txt
. Only numpy
and pandas
are needed.
Datasets are from the Movielens dataset
- ratings_train.csv: Ratings given to movies by users (Train set)
- ratings_valid.csv: Ratings given to movies by users (Test set)
- Neighborhood-based Model: Create representations for all movies and calculate cosine similarity to find nearest neightbors. Predict rating for user based on similarities
- Matrix Factorization: Decompose user-item matrix into user feature vector and item feature vector. Use 400 features only to produce predictions
- Optimized Model: Optimized matrix factorization model by using 250 features. Number of features is selected by cross validation
In practice, adjust input.txt to change user ids and movie ids to generate recommendations for
- Adjust input.txt. The input format is
user_id, movie_id
for each line - Run
python main.py
- Check
output.txt
for results. The output format is 3 lines ofuser_id, movie_id, prediction_score
for each input line. Each line corresponds to score produced from the three models in model explanation (ex. 1st line is from neighboorhood-based model)
Run python test.py
to check RMSE for some user in validation set, or for all the users in the validations set. Change variable test_user_id
in matrix_factor.py
to change test user id