JayashreeKotte / Crowdfunding_ETL

Project 2 - ETL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crowdfunding_ETL

Project 2 - ETL Mini Project

ETL mini project is aimed at making using of Python, Pandas and either Python dictionary methods or regular expressions to extract and transform the data.

Post transforming the data, extract data into flat files like csv files

Four CSV files contaning all the data in different tables should be generated

An Entity Relationship Diagram (ERD) and table schema must be created using the information in the CSV files

Finally, the csv files must be uploaded into a Postgres database

How to Install and Run the code

  1. Ensure Resources, SQL_files and Data folders are present in the Crowdfunding_ETL folder

  2. Ensure files ETL_Mini_Project_JKotte.ipynb, contacts.xlsx and crowdfunding.xlsx files are present in the Resources folder

  3. The Data folder contains all exported csv files after transforming the data

  4. The SQL_files folder contains the CrowdFunding_DB_ERD.png ERD image and crowdfunding_db_schema.sql file to use in a Postgres database.

  5. To run the code, you will need access to Jupyter Notebook, PostgreSQL and pgAdmin

  6. Open Jupyter Notebook in the terminal or command prompt and run the entire ETL_Mini_Project_JKotte.ipynb notebook in Resources folder using the "Restart & Run All" option from Kernel dropdown.

  7. Ensure everything runs successfully, check four csv files are created and added to the Data folder.

  8. Inspect the four csv files and create an ERD using QuickDBD application

  9. Post giving the table names, column names and connections in the QuickDBD application, export it as a PNG file CrowdFunding_DB_ERD.png and a Postgres file (schema file) QuickDBD_schema.sql

  10. Open pgAdmin, create a database with crowdfunding_db as the name

  11. Open the query tool for the above created DB

  12. In the query tool window, open crowdfunding_db_schema.sql file from SQL_files folder and run the code

  13. Run everything except the select statements to create the tables successfully.

  14. Ensure the tables are created using SELECT sql statements in the file at the end

  15. Additionally, you can always modify the sql file to add more queries and play around with it for desired results.

Crowdfunding_DB

Credits

To complete this challenge, I referred to previous pandas classes to get all the methods for accurate data transformation. Referred to different data traversal methods in a Panda DataFrame. Spoke to my TAs about data traversal and the ERD connections in QuickDBD. My peers were very helpful in offering tiny hints to power through when I felt stuck. I also edited the original schema file from QuickDBD as there was error when trying to run it as is. Manually edited the schema file to make foreign key connections to different tables.

References

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html

https://app.quickdatabasediagrams.com/#/

https://www.dataquest.io/wp-content/uploads/2019/03/python-regular-expressions-cheat-sheet.pdf

https://regex101.com/

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.items.html

https://docs.python.org/3/library/re.html#module-re

https://docs.python.org/3/howto/regex.html#regex-howto

https://pandas.pydata.org/docs/reference/api/pandas.Series.str.extract.html

About

Project 2 - ETL


Languages

Language:Jupyter Notebook 100.0%