anhle3112 / tmdb-box-office

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tmdb-box-office

In this project, I'm working with the TMDB Box Office Revenue dataset on Kaggle. Data points provided include cast, crew, budget, genres, runtime, posters, languages, popularity, release dates, production companies, and production countries. The data can be found from this Kaggle link: https://www.kaggle.com/c/tmdb-box-office-prediction/

The project is divided into 2 main phases:

  • Phase 1: Exploratory Data Analysis (EDA), Data Cleaning, and Feature Wrangling
  • Phase 2: Building statistical model that helps predict movie revenue.

The notebook in this repository focuses mainly on Phase 1, which is again divided into 4 main steps:

  • Step 1: Describe data
  • Step 2: Visualize & Wrangle data
  • Step 3: Handle missing data
  • Step 4: Create dummy variables

With step 1, 2 and 3 broken down into numerical vs categorical features

About


Languages

Language:Jupyter Notebook 100.0%