ETL/Data Pipelining Project Using AWSservice

In this repository, there are 2 folders: AmazonDataExtraction and SpotifyETL

AmazonDataExtraction:

In this project, the goal was to extract data from the amazon website using BeautifulSoup library.

After aggregating all the logic from the basic rough work, I have defined all the functions and extracted data from the website in:

The extra file "amazon_etl.py" is a folder created to define all the functions in a python file and can be later form a pipeline using AirFlow.

Loading the extracted data to S3 storage using Airflow.
Then, Modeling the data into star schema and finally loading into Redshift for further Analytical Work

In this project, the goal was to extract data from a playlist of spotify using Spotify API (Spotipy Library)

Build a python file to define the proper functions of extracting with the help of the above rough work file:

Now, Create an EC2 instance in AWS Console.
After Connecting to the instance, Install all the dependencies needed for the server:

Then, Perform the functions through Airflow and the data will be loaded in S3 Storage.

Modeling the data into star schema and finally loading into Redshift for further Analytical Work.

My name is WAREPAM RICHARD SINGH. In this Project, I have learned:

For more project Updates, You can find me on:

Streamline your data flow with AWS Data Pipelining - a reliable and scalable solution for seamless data ingestion, processing, and storage

Language:Jupyter Notebook 99.7%Language:Python 0.3%Language:Shell 0.0%