ksameersrk / hadoop-kmeans

Python implementation of k-means clustering algorithm in MapReduce.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hadoop-kmeans

Python implementation of k-means clustering algorithm in MapReduce.

  1. Hadoop Installation
  2. Dataset Creation
    1. createDataset.py
    2. Plot of data points
  3. K-means Clustering Algorithm
    1. Instructions for running k-means in Cloudera
    2. run.sh & reader.py
      1. run.sh
      2. reader.py
    3. MapReduce
      1. mapper.py
      2. reducer.py
    4. Plot Representation

About

Python implementation of k-means clustering algorithm in MapReduce.

License:GNU General Public License v3.0


Languages

Language:TeX 87.7%Language:Python 11.4%Language:Shell 1.0%