Minor performance difference between BanditPAM and sklearn for a small number of data points

Question

Minor performance difference between BanditPAM and sklearn for a small number of data points

lukeleeai opened this issue a year ago · comments

Description:
I have observed a not so significant performance difference between BanditPAM and sklearn for the mnist dataset n <= 20000. BanditPAM is marginally slower compared to sklearn.

Reproducibility:
You can reproduce the results by running the code available in the branch "sklearn_comparison" of the BanditPAM repository. To run the experiment, execute the following command:
python experiments/run_scaling_experiment.py. I've installed banditpam with pip install banditpam

You will then observe the results similar to the following:

Num data:  1000

<Running  SKLEARN >
0.19861984252929688

<Running  BanditPAM VA with caching >
0.8404459953308105

Num data:  10000

<Running  SKLEARN >
15.577669143676758

<Running  BanditPAM VA with caching >
20.48973298072815

But fortunately for larger N, banditpam significantly outperforms sklearn:

Num data:  20000

<Running  SKLEARN >
42.05375599861145

<Running  BanditPAM VA with caching >
29.887195110321045