ZJULearning / efanna

fast library for ANN search and KNN graph construction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Comparing to HNSW in NMSLIB

searchivarius opened this issue · comments

Hi guys,

nice work. It's great to see that the idea of using proximity graphs is becoming more popular. However, you don't cover all important state of the art. Please, consider comparing to at least the following libraries that beat FALCONN by an order of magnitude or more:

  1. https://github.com/searchivarius/nmslib
  2. https://github.com/DBWangGroupUNSW/nns_benchmark
    Thank you!

Great thanks for a good work, especially for an interesting comparison of NNDescent with LargesVis.

In addition to @searchivarius comments, I would also recommend not to turn off SIMD. This makes the comparison much less justified. If you have troubles with writing SIMD code I suggest using auto-vectorization in gcc or intel compiler, it should work nicely.

@searchivarius Did you mean FLANN (instead of FALCONN)?

Many thanks for your attention!
I didn't compare to other algorithms because the first version is just a naive one without any optimization.
Next I will add OpenMP and SIMD to the ANN search part, and compare with the well known NMSLIB , FALCONN and so on.
Thank you for your patience!

@searchivarius We tried to compare efanna with nmslib/HNSW, and we found that, on ANN search, HNSW is much better. And inspired by that, we proposed a new work "Fast Approximate Nearest Neighbor Search With Navigating Spreading-out Graphs", https://arxiv.org/abs/1707.00143
And it's better than HNSW under fair comparison. The code will be released soon.

@fc731097343 Many thanks for the paper! Looking forward to see the code.
I wonder, have you tried comparing to HNSW in nmslib after index serialization/deserialization (without reloading the data)? It should drastically reduce the HNSW memory footprint.

@yurymalkov Yes, I read the code of HNSW carefully, and same many thanks for the code of HNSW, I also try to use the serialization techniques to get speed-up for my implementation. I think I compared with it in fair. I tuned the HNSW to the best and find the best parameters for SIFT1M is M=16, efconstruction=600, and M=35, efconstruct=800 for GIST1M. And it's weird that when efconstruct gets bigger the performance becomes worse, I cannot explain that so far.