iamtodor / data-engineering-test-tasks

My portfolio with test tasks from different companies for Data Engineer

All the code has been formatted by Black: The Uncompromising Code Formatter

Configured GitHub actions

Dependabot checks on weekly basis
After each commit GitHub workflows run the following checks:

Task 1

Description: calculate pyspark aggregations from the given csv.

Tech:

python
spark
csv

Task 2

Description: calculate pyspark aggregations from the given parquet and csv.

Tech:

python
spark
csv

Task 3

Description: calculate pyspark aggregations from the given csv.

Tech:

python
spark
csv

Task 4

Description:

calculate pyspark aggregations from the given parquet
ingest the data to postgres
read the data from postgres
calculate pyspark aggregations and save as cvs

Tech:

python
spark
parquet
postgres in docker with persistent storage

Task 5

Description:

calculate pyspark metrics and dimensions aggregations from given json
test the app

Tech:

python
spark
pytest: 91% test coverage according to Coverage
json/parquet

Kafka pet project

The project itself is another GitHub repo. The purpose of the project is to prove Java, Kafka, Prometheus and Grafana knowledge.

About

MIT License

Languages

Language:Python 99.8%Language:Ruby 0.2%