devashishpatel / IMDB-Top-5000

Data Exploration and mining insights into Hollywood for the top 5000 moviesi n the IMDB Database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Exploration and mining insights into Hollywood for the top 5000 movies on IMDB

Background

Ever since I remember, IMDB was my go to place to know anything and everything about the movies. I owe my wierd taste in movies to IMDB and without the site, I would have missed out on some trully beautiful gems.

When I saw a data-set for IMDB's top 5000 movies on Kaggle, I knew that I had to perform data mining to gain insights.

The dataset comprises of over 5000 movies not just from Hollywood but from around the world. It has financial information about the movies, the cast and directors and the corresponding IMDB Rank. Do note, that the dataset is in no way comprehensive. Nonetheless it is sufficiently big to pique my interest. If you would like to know more, then check it here

Objective of the data exploration was to answer 3 questions:

  • What are the most frequently used Plot Key words, Movie Title and Genres ?
  • What is the trend for the Gross Revenue and Budget of the movies in nominal terms as well as inflation adjusted terms over 100 years ?
  • Who are the top actors, directors and movies for each of the past 10 decades ?

To Dive into the Data Exploration and Code, please check my R-Notebook

About

Data Exploration and mining insights into Hollywood for the top 5000 moviesi n the IMDB Database


Languages

Language:Jupyter Notebook 100.0%