khaledadrani / acetl

Another CSV Extract-Transform-Load Application!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

acetl

Another CSV Extract-Transform-Load Application!

Technical details

  • Python: 3.10.13
  • database: Postgres

Usage example (Locally)

Assuming you have cloned the repository, use the docker-compose to launch a database instance:

docker compose up

Generate some dummy data using etl cli app

python cli.py --help 
python cli.py generate-dummy-data

Use the cli application to do ETL

python cli.py etl-single-file data/retail_data_medium.csv
python cli.py etl-multiple-files /data

Note: make sure to install these dependencies if you are working locally (linux)

#for postgres psycopg2 driver
apt update && apt install -y build-essential libpq-dev

You can run the web application and check it at http://localhost:8000/docs

python main.py

Kubernetes Architecture Diagram

Alt Text

Done

  • ETL “Extract – Transform – Load” pipeline that ingests 1 or multiple CSV files into a Database
  • A simple REST API that exposes the recently ingested data: •Description: Returns the first 10 lines from Database •Request: Get /read/first-chunck •Response: 200 OK •Response Body: JSON •Response Body Description: A list of 10 JSON objects

TODO

  • update and improve docstrings
  • improve logging/tracing
  • add github actions pipeline for tests
  • improve Kubernetes Architecture Diagram

About

Another CSV Extract-Transform-Load Application!

License:MIT License


Languages

Language:Python 98.8%Language:Dockerfile 1.2%