Danieldacruz7 / Data-Engineering-Capstone

A project that incorporates SQL, NoSQL, Apache Airflow, Cognos Analytics and PySpark into a data pipeline.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data-Engineering-Capstone

A project that incorporates SQL, NoSQL, Apache Airflow, IBM Cognos Analytics and PySpark into a data pipeline.

Project Motivation:

The Capstone project aimed to develop an end-to-end data pipeline. In the real world, there may be multiple sources of data. These sources include SQL databases, NoSQL databases and Data warehouses. The pipeline would need to query the databases and retrieve data for further analysis.

An analytics such as IBM Cognos Analytics would be used to visualize the data. Tools such as Apache Airflow were used to automate the process of data processing. Finally, a PySpark model was used to make sales predictions on the data.

How to Interact with the Project:

As structured in the IBM professional certificate, each stage of the data pipeline is placed in separate files. These files can be viewed whereby SQL scripts, screenshots, python files and notebooks can be found.

About

A project that incorporates SQL, NoSQL, Apache Airflow, Cognos Analytics and PySpark into a data pipeline.


Languages

Language:Jupyter Notebook 80.7%Language:Python 19.3%