Data clean-up, building tables, creating queries
Exploring data cleaning and merging with a Kaggle dataset of Netflix Movies and TV Shows. I am practicing the Extract, Transform, Load (ETL) process of cleaning data.
- Netflix Movies and TV Shows, 6 .csv files downloaded from Kaggle.
- GitHub
- Excel
- Pandas
- Jupyter Notebook
- QuickDBD
- SQL
- pgAdmin 4
- Python 3.7.13
- Create a GitHub repository and clone to local computer
- Download Kaggle .csv files into local folder
- Using Excel, open and inspect each file
- Open .ipynb Jupyter Notebook to create DataFrames using the .csv data
- Check datatypes, Null values, and reformat columns as needed
- Drop unnecessary columns
- Create a movies_df ...
- Open new canvas on QuickDBD and create relational diagram (ERD) of files
- Write SQL code to create tables and build Netflix database on pgAdmin
- Import .csv files into new tables on pgAdmin