vaibhavmagon / Spark-Python-MovieReviews

Script to run and find similarities between movies from a movie lens data set using Python & Spark Clustering.

Home Page:https://grouplens.org/datasets/movielens/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python & Spark Collaborative Filtering Script using Movielens Dataset.

This is a script with dataset to run and find similarities between from a big data set using Python and Spark. One needs to essesntially pass an id for the movie and then find similar movies based on item based collaborative filtering. One can change the values of threshold and modify accordingly.

More here: https://realpython.com/build-recommendation-engine-collaborative-filtering/

Files

To Run

  • Install Spark & Python on your system.
spark-submit movie-similarities.py <id>

(The id of the movie to find similarities for, 50 is for star wars!).

Maintainers

  • Vaibhav Magon

About

Script to run and find similarities between movies from a movie lens data set using Python & Spark Clustering.

https://grouplens.org/datasets/movielens/


Languages

Language:Python 73.8%Language:Perl 13.8%Language:Shell 12.4%