MDAV(Maximum Distance to Average Vector) clustering implementation in java
Algo used in anonymisation to compute clusters for micro aggregation
Algo
-
Compute the average record of all records, find most distant record from average (xr)
-
Find most distant record xs from xr
-
Compute 2 clusters around xr and xs (closest points)
-
if there is more than 3k records, repeat 1.
-
if there is between 3k-1 2k records, compute average
5.1. find most distant xr from average
5.2. compute a cluster arround xr
-
remaining points are the last cluster