Theoretical part
Get as much knowledge as possible about modern state of Hadoop ecosystem during 2 week time period
Practical part
Setup minimal Hadoop infrastructure on local VMs (VirtualBox or any other) using Ansible
Setup ClickHouse on local VM using Ansible
Create MR job (no restrictions on language/tooling) to transform CSV file with 3 fields, 1000000 rows from HDFS into ClickHouse table
Outcomes
Ansible code (in form of Git repository snapshot) to setup Hadoop and Clickhouse
Sources (in form of Git repository snapshot) to compile/submit MR job
Ability to demonstrate understanding of Hadoop ecosystem
Deadline for submitting practical part: 21.03.2019. In case you have any questions feel free to contact XXXXXX
Putting you in the contact with my colleague Natalya as well.