josephmachado / online_store

End to end data engineering project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data engineering project: data extraction to analysis

For more details read the blog here

Architecture diagram

Architecture

Setup

Pre-requisites

  1. git
  2. Github account
  3. Terraform
  4. AWS account
  5. AWS CLI installed and configured
  6. Docker with at least 4GB of RAM and Docker Compose v1.27.0 or later

Read this post, for information on setting up CI/CD, DB migrations, IAC(terraform), "make" commands and automated testing.

Run these commands to setup your project locally and on the cloud.

# Clone the code as shown below.
git clone https://github.com/josephmachado/online_store.git
cd online_store

# Local run & test
make up # start the docker containers on your computer & runs migrations under ./migrations
make ci # Runs auto formatting, lint checks, & all the test files under ./tests

# Create AWS services with Terraform
make tf-init # Only needed on your first terraform run (or if you add new providers)
make infra-up # type in yes after verifying the changes TF will make

# Wait until the EC2 instance is initialized, you can check this via your AWS UI
# See "Status Check" on the EC2 console, it should be "2/2 checks passed" before proceeding

make cloud-metabase # this command will forward Metabase port from EC2 to your machine and opens it in the browser

make cloud-dagster # this command will forward Dagster port from EC2 to your machine and opens it in the browser

You can connect metabase (open this using make cloud-metabase) to the warehouse with the following credentials

WAREHOUSE_USER: sde
WAREHOUSE_PASSWORD: password
WAREHOUSE_DB: warehouse
WAREHOUSE_HOST: warehouse_db
WAREHOUSE_PORT: 5432

Create database migrations as shown below.

make db-migration # enter a description, e.g., create some schema
# make your changes to the newly created file under ./migrations
make warehouse-migration # to run the new migration on your warehouse

For the continuous delivery to work, set up the infrastructure with terraform, & defined the following repository secrets. You can set up the repository secrets by going to Settings > Secrets > Actions > New repository secret.

  1. SERVER_SSH_KEY: We can get this by running terraform -chdir=./terraform output -raw private_key in the project directory and paste the entire content in a new Action secret called SERVER_SSH_KEY.
  2. REMOTE_HOST: Get this by running terraform -chdir=./terraform output -raw ec2_public_dns in the project directory.
  3. REMOTE_USER: The value for this is ubuntu.

Tear down infra

After you are done, make sure to destroy your cloud infrastructure.

make down # Stop docker containers on your computer
make infra-down # type in yes after verifying the changes TF will make

References

  1. Dagster docs
  2. Metabase docs
  3. FastAPI docker
  4. Dagster docker setup
  5. dbt docs

About

End to end data engineering project

License:MIT License


Languages

Language:Python 61.6%Language:HCL 18.8%Language:Makefile 11.4%Language:Dockerfile 8.2%