lc2a / spark-test

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spark Coding Exercise

Overview

The aim of this exercise is to demonstrate existing knowledge or ability to learn about Spark and Docker. Information on Spark can be found here.

This project contains an example Spark application which will summarise the web traffic from the given Apache log file. Both Java and Scala versions are supplied - you can choose to use a particular version or try both if you want!

Tasks

  1. First you will need a Docker container running Spark. Start here to install Docker if you don't already have it.
  2. Next you will need to obtain a Spark Docker image; e.g. like this.
  3. Install and run the application noting the results.
  4. Perform a code review of either or both versions, noting any problems and describing any changes you would suggest to the developer.
  5. Add code to a chosen version to apply a filter such that certain internal addresses (e.g. 10.10.10.1) can be ignored.

Notes

  • Please do not fork this repository or submit answers as a Pull Request. Test responses should be submitted via email.

About


Languages

Language:Java 65.5%Language:Scala 34.5%