Projects on various BigData platforms

Projects

This repository hosts the following projects, more can be found on each projects individual README.

Clinking one of the following links takes you directly to the projects module.

Skyline operator implemented in HadooopMR.

Distributed Bloom Filter and Count-Min sketches in Apache Storm.

Scheduling workloads in Spark, Flink, Apex and GPUs based on various metrics.

Calculating the Jaccard Index of terms and categories using a Per-Split SemiJoin algorithm in HadoopMR.

Used frameworks

Links redirect to each framework's download page.

Elasticsearch (entire ELK stack)

Docker

The docker folder in the root directory contains various docker-compose.yml files for some of the Frameworks used in these projects. Docker is extremely powerful when complex networking is involved or rapid prototyping is necessary.

Structure

Inside each module there may be more submodules, usually one for each implementation (eg. Spark,Hadoop,...)

Building

This repository uses Maven3 to build its submodules. In order to build all of the submodules simply run the following from the root of this repo.

mvn clean package

Inside each submodule there will be a target directory with the module's uberjar.

To build just a single artifact (eg. The hadoop implementation of the skyline) simply:

mvn clean pacakge -pl :hadoopSkyline

About

Projects on Spark and Hadoop

hadoop spark bigdata skyline sketch

Languages

Language:Java 99.6%Language:Python 0.3%Language:Cuda 0.1%