Try PyLucene for ann-benchmarks implementation
alexklibisz opened this issue · comments
Background
The ann-benchmarks project now includes an implementation of Lucene's HNSW models:
- https://github.com/erikbern/ann-benchmarks/blob/master/install/Dockerfile.luceneknn
- https://github.com/erikbern/ann-benchmarks/blob/master/ann_benchmarks/algorithms/luceneknn.py
This is based in PyLucene: https://lucene.apache.org/pylucene/
Elasticsearch introduces a lot of inefficiency into the elastiknn benchmark. A while back, I determined the elastiknn benchmark spends ~40% of its runtime on things other than nearest neighbor search.
Maybe there's a way to integrate the underlying Lucene models w/ PyLucene, to get a more apples-to-apples comparison of the HNSW and LSH models?
Deliverables
- Integrate Elastiknn's Lucene models with PyLucene
- Update the Elastiknn implementation in ann-benchmarks to use the PyLucene implementation
Related Issues
I haven't been able to upgrade ann-benchmarks to the latest version of Elastiknn due to incompatibilities in the ann-benchmarks Python version and the version expected/required by the elasticsearch client (erikbern/ann-benchmarks#316).