mdauthentic / ETLProject-Batch

An ETL pipeline where data is captured from REST API (Remotive, Adzuna & GitHub) and RSS feeds (StackOverflow). The data collected from the API is stored on local disk. The files are preprocessed and ETL jobs are written in spark and scheduled in Prefect to run every week. Transformed data is moved to PostgreSQL.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mdauthentic/ETLProject-Batch Stargazers