Minor performance difference between BanditPAM and sklearn for a small number of data points
lukeleeai opened this issue · comments
Description:
I have observed a not so significant performance difference between BanditPAM and sklearn for the mnist dataset n <= 20000. BanditPAM is marginally slower compared to sklearn.
Reproducibility:
You can reproduce the results by running the code available in the branch "sklearn_comparison" of the BanditPAM repository. To run the experiment, execute the following command:
python experiments/run_scaling_experiment.py
. I've installed banditpam with pip install banditpam
You will then observe the results similar to the following:
Num data: 1000
<Running SKLEARN >
0.19861984252929688
<Running BanditPAM VA with caching >
0.8404459953308105
Num data: 10000
<Running SKLEARN >
15.577669143676758
<Running BanditPAM VA with caching >
20.48973298072815
But fortunately for larger N, banditpam significantly outperforms sklearn:
Num data: 20000
<Running SKLEARN >
42.05375599861145
<Running BanditPAM VA with caching >
29.887195110321045