iamtodor / data-engineering-test-tasks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

My portfolio with test tasks from different companies for Data Engineer

All the code has been formatted by Black: The Uncompromising Code Formatter

Configured GitHub actions

  1. Dependabot checks on weekly basis

  2. After each commit GitHub workflows run the following checks:

Task 1

Description: calculate pyspark aggregations from the given csv.

Tech:

  • python
  • spark
  • csv

Task 2

Description: calculate pyspark aggregations from the given parquet and csv.

Tech:

  • python
  • spark
  • csv

Task 3

Description: calculate pyspark aggregations from the given csv.

Tech:

  • python
  • spark
  • csv

Task 4

Description:

  • calculate pyspark aggregations from the given parquet
  • ingest the data to postgres
  • read the data from postgres
  • calculate pyspark aggregations and save as cvs

Tech:

  • python
  • spark
  • parquet
  • postgres in docker with persistent storage

Task 5

Description:

  • calculate pyspark metrics and dimensions aggregations from given json
  • test the app

Tech:

  • python
  • spark
  • pytest: 91% test coverage according to Coverage
  • json/parquet

Kafka pet project

The project itself is another GitHub repo. The purpose of the project is to prove Java, Kafka, Prometheus and Grafana knowledge.

About

License:MIT License


Languages

Language:Python 99.8%Language:Ruby 0.2%