will-molloy / MapReduce-K-means-image-processing

K-means image/video data clustering via. MapReduce using Apache Spark. SOFTENG751 High Performance Computing (A+)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"Big Data" processing with MapReduce framework

K-means (main) implementation

  • This processes images to determine the most common colour values:
  • The source code for the k-means implementation is found under the k-means directory.
  • This includes instructions to run.

reddit comment implementation

  • This calculates the average comment score per sub reddit and was used to compare frameworks.
  • The source code for the reddit comment implementations is found under the reddit-comments directory.
  • This has been grouped by framework (couchDB, Hadoop, Spark, Cloud Haskell).
  • The sequential Java version is found within the Hadoop source code or here.
  • The data set is taken from here. We uncompressed it and took the first 20,000,000 lines (approx 11GB of JSON).

Runnables

  • The latest binaries for all implementations are found zipped on the releases page.
  • This includes input images/video (see the resources directory) and instructions to run so you can reproduce our results.


Image credit: http://www.well-typed.com/blog/73/

About

K-means image/video data clustering via. MapReduce using Apache Spark. SOFTENG751 High Performance Computing (A+)

License:GNU General Public License v3.0


Languages

Language:Scala 60.4%Language:Java 17.3%Language:Haskell 16.1%Language:Python 3.7%Language:Shell 1.5%Language:JavaScript 1.0%