K-Means

Execution Data

Kmeans-Serial

Base code provided and referenced from https://github.com/marcoscastro/kmeans
Keeps track of:
- points + what cluster they are in
- clusters + what points are in them
Does the summation and mean of the cluster means at the end of each point association

Better-Kmeans-Serial

Updated code from the aforementioned reference
Keeps track of:
- points + what cluster they are in
- clusters + the intermediate sums from point addition and removal
Does the summation of points to the mean as points are iterated
- Mean is computed sequentially at the end

Kmeans-Parallel

Parallelized code from the Better-Kmeans-Serial
Parallelization is done over the points rather than over the clusters due to a greater chance of sufficient parallel slack
Data contention is handled through the use of thread local storage
Keeps track of:
- points + what cluster they are in
- clusters + the intermediate sums from point addition and removal
- thread local storage handles the summation of point additions and the number of points added to a cluster as well as the cluster switches made by points
The thread local storage resolves the intermediate sums to the appropriate cluster and resolves the addition or removal of points to ensure proper calculation of the mean