hhua/collaborative_filtering

##Collaborative Filtering in Netflix Recommendation System This is an assignment from 11-741 Information Retrieval, Carnegie Mellon University.

##Report - Corpus Exploration Provide the following corpus statistics:

For user ID 1234576, provide the following:

For movie ID 4321, provide the following:

##The Experiments Algorithms are based on the k-Nearest Neighbors (kNN) learning algorithm.

Input: a movie, user pair: p = (movieA, userA)

Find the k nearest-neighbors of p using a similarity metric of your choice. These neighbors will be a list of movie, user, rating triples: N={(m, u, r) | m = movie, u = user, r = rating in [1,5]}
Produce a prediction of the rating for p based on the neighbors N

Output: a rating r in the range [1,5]

For prediction step, there are two easy ways for predicting ratings:

The mean rating for this movie among the neighbors, or
The weighted mean rating for this movie among the neighbors, using the similarity measure from step (1) as the weight.

4 specific variations are required for this assignment:

Find neighbors for step (1) by using a user-user similarity metric of your choice. Use the mean or weighted mean rating for step (2).
Find neighbors for step (1) by using a movie-movie similarity metric of your choice. Use the mean or weighted mean rating for step (2).
One of the above for step (1), but apply user-rating and/or movie-rating normalization for step (2).
A custom algorithm of your choice. You may extend one of the above algorithms or come up with your own.

Implementation

As to similarity between neighbors, you may use dot, cos and PCC methods to calculate it. Here I implemented cos similarity for simplicity.

For normalization,