collaborative-filtering predictions recommender-system user-based-recommendation python mae-values

MovieRecommendationSystem

Abstract

Recommender Systems are used almost everywhere in today’s world, from e-commerce websites, streaming services to various social media websites. These systems have become an integral part of our day to day life that we may not even realize that we are using one. In this project we are implementing collaborative filtering based recommender systems. There are two main approaches of collaborative filtering:

User-based
Item-based

We explore several user-based collaborative filtering techniques on MovieLens Dataset, and try to predict the top 5 similar movies for a given target user. The approaches used in this project have been novel and are able to give lower MAE values as compared to the standard implementations.

Project Implementation

The general idea of the project is simple: implement a naive user-vased recommendation system and build upon that to get better results.

Naive User based collaborative filtering works based on the assumption that the users who have liked similar movies in the past will tend to like similar movies in the future. Hence, the first step in recommending movies to target users is to find similar neighbours of the given target user. The similarity metric used in our study is Pearson Correlation. After calculating the similarity score of target user with every other user, we consider top 10 most similar users to the target users in our further calculations. Predictions are made using the Resnick prediction formula.
Improvements made:
- Using tag data (unlike the naive approach, which used only rating data)
- Giving importance to the number of co-rated movies
- Increasing the number of neighbours
- Giving more importance to rare movies (with a lesser number of movies)
- Using cosine similarity instead of Pearson Correlation
Testing: MAE loss was used along with 5-fold cross-validation.

Results

Naive Implementation
Improvement 1:
Improvement 2:
Improvement 3:
Improvement 4:
Improvement 5:

Usage

Libraries

numpy==1.19.4
pandas==1.1.4
scipy==1.5.4
prettytable==2.0.0
vaderSentiment==3.3.2
tqdm==4.51.0

To install libraries:

$ pip3 install -r requirements.txt

Data

MovieLens Dataset

Files

ratings.csv
tags.csv
movies.csv
test_user.txt: random users to make predictions on

Input Files

to RS_main.py:
- ratings.csv
- tags.csv
to test.py:
- test_user.txt
- utilitymatrix2.csv (generated from RS_main.py)

Execution

RS_main.py:

Running this file executes the recommender system including prediction and and performance evaluation for the basic implementation and its 5 improvements.

$ python3 RS_main.py --input ./data/ratings.csv --output eval.csv

test.py:

Executing this file lists the top-5 recommended movies along with previously seen movies for the 10 random users using our best performing improvement to the recommender system.

$ python3 test.py --input ./data/test_user.txt --output output.csv

Output files

RS_main.py:
- utilitymatrix.csv
- utilitymatrix2.csv (main utility matrix)
- eval.csv: MAE values for each implementation
test.py:
- output.csv: final predictions

Acknowledgements

This project is a team effort. Contributions were made by:

Aniruddha Karajgi
Rohit K Bharadwaj
Jhaveri Ayush Rajesh
Rahul Jha
Pranay Khariwal

For more information regarding this project, have a look at our report.

About

A study on the naive user-based collaborative filtering algorithm and related improvements on the Movielens dataset.

collaborative-filtering predictions recommender-system user-based-recommendation python mae-values

Languages

Language:Python 100.0%