icharo-tb / GW2-SRS-with-AWS-Implementation

Continuation of GW2-SRS project focused on migrating the ETL to the cloud and making optimizations with Docker and Airflow.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GW2-SRS with AWS-S3 implementation

Table of Contents πŸ“”

Overview πŸ‘€

This repository is an improvement of the original GW2-SRS project that make use of AWS S3 to store data such as txt files of urls and csv and json files from the ETL result. Apart from that, this project intention is also implement Docker as a container medium and Airflow to execute the ETL on monthly intervals (31 days).

GW2-SRS ETL

This module have been kept quite the same as the original, just making some few changes to adapt to the new work environment and also to admit some new features.

AWS S3

Making use of a S3 bucket, it is intended to store some raw and clean data on it, so it can later pivot to other tools such as AWS database options with DynamoDB or RDBMS.

Docker

The intention behind the use of Docker is make the ETL functional anywhere at anytime, by making a Docker container it will be possible to execute the ETL is other devices such as AWS EC2 machine.

Airflow

To make the ETL a lot more functional it is intended to implement an Airflow DAG. By making Airflow execute the ETL monthly (about 31 days of wait per round) we will make sure the ETL keeps not only running but also adding new data.

Extra Information

  • This ETL is running in batch mode, no stream mode is being used since the data requires being stored first on a web.
  • The ETL use Python logging to store all the errors or value information into a log file.

Project Schema

About

Continuation of GW2-SRS project focused on migrating the ETL to the cloud and making optimizations with Docker and Airflow.

License:GNU General Public License v3.0


Languages

Language:Python 99.7%Language:Shell 0.3%