ankschoubey / Hybrid_Recommender_System

Uses KNN on MovieLens Dataset and Hybridization Method to suggest new movies.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hybrid Recommender System

Built as a part of my final year project during graduation.

Uses Movielens 100K dataset (2016 version)

Features/Methods

Collaborative filtering

  • User-based Collaborative Filtering
  • Item-based Collaborative Filtering
  • CF using Singular Value Decomposition (SVD)
  • Popularity based (implemented as sum of all ratings recieved on a particular movie)

Content Based Filtering

  • Simple Approach
  • Normalising of Category vector (The size of similarity matrix reduced from 9000x9000 to 800x800.)
  • Using Bag of Words (for movie titles)

Hybridization techniques

  • Mixed Hybridization
  • Switching
  • Feature Combining: Collaborative Via Content Based

User Interface

The focus on UI was low because focus was on algorithm.

More screenshots

Load virtual environment and dependencies

Better to use Anaconda

Creation: conda env create -f conda_environment.yml

Load Environment: source activate recommender

For those using pip

pip install -r requirements.txt

Download and extract

MovieLens Dataset.

For building database

Use MySQL. Create a empty database. Remember database name.

Running

Make sure MySQL server is running.

Run sample_recommender.py to check everything works properly.

If you are setting up for the first time you will be asked for database details.

If you want to reset run generate_defaults.py or delete defaults.json file

Also you would have to make changes to DATABASEvariable in Hybrid_Recommender_System/setting.py which Django will use.

Release Versions

v0.1-alpha - Command Line Interface

v0.2-alpha - Django Support

References

Recommender Systems Basics

SVD:

For Faster Numerical Computations in Python

NumPy Tutorial: Data analysis with Python

Numpy Cheatsheet

Pandas Tutorial: Data analysis with Python: Part 1

Pandas Tutorial: Data analysis with Python: Part 2

scipy.sparse.csr_matrix

sklearn.metrics.pairwise.cosine_similarity

Things not implemented

  1. Thoughts on Working with Larger Dataset
  2. Thoughts on working with multi criteria dataset

About

Uses KNN on MovieLens Dataset and Hybridization Method to suggest new movies.

License:MIT License


Languages

Language:Python 86.9%Language:HTML 12.3%Language:CSS 0.8%