=
- Gen.py
- Stage 1: Canopy Center
- mapperStg1.py
- reducerStg1.py
- Stage 2: Canopy Assign
- mapperStg2.py
- reducerStg2.py
- Stage 3: Cluster Center
- mapperStg3.py
- reducerStg3.py
- Stage 4: Cluster Assign:
- mapperStg4.py
- reducerStg4.py
*Functions of each of the files will be updated at a later date.
=
-> Generates the Data Set on which we use Canopy-Clustering. -> Generates a set of k-Centroids.
-> DataPoint class.
- Mapper:
- Input: Data points.
- Output: List of Canopy Centers.
- Function:
- Reducer:
- Input: Canopy Centers
- Output: Canopy Centers
- Function:
- Mapper:
- Input: Canopy Centers
- Output: Canopy Centers and the Data Points that belong to each.
- Function:
- Reducer:
- Input: Canopy Centers, Data Points (stdin)
- Output: Identity
- Function: Echos the result from the Mapper.
- Mapper:
- Input:
- Output: K Centroids and the Data Points that belong to each.
- Function:
- Reducer:
- Input:
- Output:
- Function:
- -> List of 'k' Centroids
- -> List of Canopy Centers
- -> Canopy Centers, Data Points (stdin)
- Mapper:
- Input:
- Output:
- Function:
- Reducer:
- Input:
- Output:
- Function:
To replicate running:
Edit the run.sh shell script to run.
Note:
If running on windows cmd, you have to create your own Sort function to sort input from the mapper. Personally, I'd recommend just using a linux OS to smoothen it all out.