jzhao0626 / ProjectETL_Movie

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ProjectETL_Movie

Project Proposal

DataBaes are looking at the correlation between IMDB ratings and title genres in Netflix. We followed ETL to clean our data so it can be analyzed.

  • Extract: Pulled CSV files using Python/Pandas from the below websites:

  • Transform: To transform our data we started by analyzing the above CSV files and narrowing the information. We filtered and dropped unwanted columns and null values from the netflix-shows data to narrow our dataset to eight columns: show_id, type, title, country, data_added, release_year, rating, and listed_in.

  • Load: Our final dataframe is housed in a relational database created in pgAdmin titled DataBaes. We chose a relational database to support the table and row structure of our dataframe. This final database includes information about Netflix titles and also reflects viewer reviews from IMDb.

  • More in depth analysis and appendix can be found in the ETL_Project-Netflix_Ratings_and_Production folder in the main repository.

About


Languages

Language:Jupyter Notebook 100.0%