alexklibisz / elastiknn

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.

Home Page:https://alexklibisz.github.io/elastiknn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Try PyLucene for ann-benchmarks implementation

alexklibisz opened this issue · comments

Background

The ann-benchmarks project now includes an implementation of Lucene's HNSW models:

This is based in PyLucene: https://lucene.apache.org/pylucene/

Elasticsearch introduces a lot of inefficiency into the elastiknn benchmark. A while back, I determined the elastiknn benchmark spends ~40% of its runtime on things other than nearest neighbor search.

Maybe there's a way to integrate the underlying Lucene models w/ PyLucene, to get a more apples-to-apples comparison of the HNSW and LSH models?

Deliverables

  • Integrate Elastiknn's Lucene models with PyLucene
  • Update the Elastiknn implementation in ann-benchmarks to use the PyLucene implementation

Related Issues

I haven't been able to upgrade ann-benchmarks to the latest version of Elastiknn due to incompatibilities in the ann-benchmarks Python version and the version expected/required by the elasticsearch client (erikbern/ann-benchmarks#316).