ssarfraz / FINCH-Clustering

Source Code for FINCH Clustering Algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there any randomness in the clustering results?

Classmate-Huang opened this issue · comments

commented

Hi, I fixed the random seed and input data and then applied FINCH for clustering. But I found that the results obtained by each clustering are different, what should I do to ensure that I can get a fixed result every time?

image

P.S. I have a large amount of data (hundreds of thousands) and use the NNDescent method in 'pynndescent', is it possible that this is the cause? What can I do?

Looking forward to your reply, thank you very much

Yes thats because of the use of ANN method (pynndescent in this case). Though it should not have much impact on the cluster purity in the end, but to have consistent runs every time, I suppose there should be a way to fix the random seed. If I look at pynndescent implementation it seems it takes a 'random_state' parameter: https://pynndescent.readthedocs.io/en/latest/api.html#pynndescent . You may try it. If it is well implemented, then this should fix the issue