dmarks84 / Coursework_Capstone_Full_Data_Engineering

Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project(CapstoneProject_Full_Data_Engineering)

Part of the Coursera series: IBM Data Engineering & Python

Summary

In this project, I applied all of the skills and knowledge gained during the courses leading up to it. We were tasked with taking in OLTP data via reading a .csv file as well as querying a SQL (MySQL) database. This data was then exported for additional querying and manipulatoin in a NoSQL database (MongoDB). We then agglomerated the data in a datawarehoues and performed addional SQL queries and manipulation, this time using PostgreSQL. On the data, we created some visualizations before setting up a pipeline to handle automation of ETL going forward, and we ended the project by developing an automated process to create a machine learning model to predict future behavior.

Skills (Developed & Applied)

Programming, Python, RDBMS & SQL, SQL (MySQL), SQL (PostgreSQL), SQL (SQLite), NoSQL (Cassandra), NoSQL (MongoDB), Databases, Statistics, Probability, Linear Algebra, SciPy, Numpy, Pandas, Seaborn, Matplotlib, Plotly, BeautifulSoup, Dataframes, ETL &| ELT & Data Pipelines, DAGs, Apache Airflow, Apache Kafka, Apache Spark, Apache Hadoop, Automation, Linux/Bash/Shell Commands, Webscraping, APIs, Data Modeling, EDA, Data Visualization, Data Summarization, Data Reporting, Regression, Supervised ML, Communication, Technical Writing

About

Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Jupyter Notebook 72.1%Language:Python 27.9%