Lunnasi / EG

Home Work

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

EG


Theoretical part

Get as much knowledge as possible about modern state of Hadoop ecosystem during 2 week time period

Practical part

Setup minimal Hadoop infrastructure on local VMs (VirtualBox or any other) using Ansible

Setup ClickHouse on local VM using Ansible

Create MR job (no restrictions on language/tooling) to transform CSV file with 3 fields, 1000000 rows from HDFS into ClickHouse table

Outcomes

Ansible code (in form of Git repository snapshot) to setup Hadoop and Clickhouse

Sources (in form of Git repository snapshot) to compile/submit MR job

Ability to demonstrate understanding of Hadoop ecosystem


Deadline for submitting practical part: 21.03.2019. In case you have any questions feel free to contact XXXXXX

Putting you in the contact with my colleague Natalya as well.

About

Home Work