Pure spark for engagement experimentation
[ ] 500 main cluster
[ ] take all cluster > 200 and recluster into 10 sub clusters. E.g. if you have 50 in 500 main cluster that has size > 200, then you will have 450 + 50*10 clusters in the end
[ ] Create plot for final result. Centroid distance + max + max and centroid distance - max - max
[ ] Count1: values on the left larger than 0.000001
[ ] Count2: count of values on the right smaller than 1.34
[ ] Count3: count of values on the left larger than 1.48
#measures [ ] Max point to center [ ] Average point to center [ ] cluster sizes [ ] cluster centroids