The motive of the project is to scrape information of movies from IMDb, Rotten Tomatoes and Metacritic to collect the data for a Knowledge Graph that is used as the source of data for a Movie Recommendation System.
- Scrapes data from the websites.
- Processes the data to remove HTML tags and Unicode values.
- Creates necessary CSV (Comma-Separated Values) file to be imported into the knowledge graph.
- Connects Neo4j Aura Database (knowledge graph built) to the Python program.
- Provides a simple GUI platform to get the top 5 recommended movies based on the movie searched.
Note: Make sure python and pip are properly installed on the system.
-
Install the Necessary Libraries (For 1st time users having one or many of these libraries not installed already)
- Run the following in the terminal or cmd:
pip install beautifulsoup4 pip install requests pip install html5lib pip install regex pip install pandas pip install scikit-learn pip install neo4j pip install python-dotenv
-
Scrape Data (If you want to generate or update the scraped values in Scraped Data folder)
- Open the
IMDbScraping.py
file and run the code.
It will create or update theimdb_movie_list.csv
in the Input folder with the current values of the top 250 movies on IMDb. - Repeat
step ii
withRottenTomatoesScraping.py
andMetacriticScraping.py
files.
It will update the respective files based on the list of movies in theimdb_movie_list.csv
file. - Now, run the
ProcessingData.py
file.
This processes the data intomovies.csv
andratings.csv
files, which are used as input files for further data processing. - Open the Jupyter Notebook named
MovieRecommendationSystem.ipynb
and run the program as specified in the notebook.
This generates the necessary output files to import to the Neo4j instance.
- Open the
-
Create the Knowledge Graph (Neo4j Graph Database)
- Login into your neo4j account (create if it does not exist).
- Create an AuraDB instance.
- Save the credentials .env file in the project folder and update the folder name in the
MovieRecommendationSystem.ipynb
andMovieRecommender.py
files. - Click on import to open the import portal of the neo4j platform.
- Add the CSV files present in the
Output
folder. - In the options menu, click
Open model
and selectneo4j_importer_model.json
present in the project folder. - Click on
Run Import
. The values will be imported into the database.
-
Run the Movie Recommender
- Now, run the code in the
Connecting the code to Neo4j Database and Running a Sample Query
section of the Jupyter Notebook to test the connectivity of the neo4j database and python code. - Run the
MovieRecommender.py
.
It will open the GUI window. Select the criteria as given in the program and select the movie. The output will provide the top 5 recommended movies based on the similarity score.
- Now, run the code in the
- BeautifulSoup - A Python library for pulling data out of HTML and XML files.
- requests - A simple Python library that allows us to send HTTP requests exceptionally easily.
- html5lib - A pure-python library for parsing HTML.
- csv - A module in the standard Python library that implements classes to read and write tabular data in CSV (Comma-Separated Values) format.
- pandas - A Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labelled" data.
- numpy - A Python library that provides a multidimensional array object, various derived objects, and an assortment of routines for fast operations on arrays, including mathematical, logical, sorting and much more.
- sklearn - A Python module for machine learning built on top of SciPy.
- neo4j - A python library that provides drivers which allows to make a connection to the database and develop applications which create, read, update, and delete information from the graph.
- dotenv - It reads key-value pairs from a .env file and can set them as environment variables.
- tkinter - It offers multiple options for developing GUI (Graphical User Interface).
- BeautifulSoup Documentation
- GeeksforGeeks - BeautifulSoup
- NithyaKrishnamoorthy - Knowledge Graph - Repository on which
MovieRecommendationSystem.py
is based on. - Neo4j Python Driver Documentation
- Neo4j Cypher query language Documentation
- freeCodeCamp.org - Tkinter Course
and many more Articles, Documentations, Repositories and Videos.