Bug Report: Slower than k-means on `n=10,000` moon dataset

Question

Bug Report: Slower than k-means on `n=10,000` moon dataset

motiwari opened this issue a year ago · comments

Original comment: https://news.ycombinator.com/reply?id=35464068&goto=item%3Fid%3D35445312%2335464068

Hi Mo, thanks for this work. It seems interesting.
I had the chance to play a little bit and wanted to compare that with KMeans. I relied on sklearn KMeans implementation.

Furthermore, I did some examples (mostly what is available). But One interesting thing I did is I generated some isotropic Gaussian blobs for clustering (using make_blobs) and then tried a comparison between the two methods. Bandit PAM was a little bit better for a couple of metrics I used, but also much faster. I was generating n_samples=1000 but then I increased it to n_samples=10000 and I found that it is much slower than KMeans, see [1] and code is in [2]. Is there a particular reason for that?

[1] https://imgur.com/a/VibpgNz

[2] https://paste.elashri.xyz/aXCE