alexklibisz / elastiknn

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.

Home Page:https://alexklibisz.github.io/elastiknn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Try getting rid of HashAndFreq to minimize allocations

alexklibisz opened this issue · comments

Background

Elastiknn currently uses a small pojo called HashAndFreq to represent a particular Hash and the number of times (i.e., frequency) that it should be repeated when searching the index for other vectors. The frequency is only ever > 1 for the Permutation LSH model. I would be curious what happens if I just replace HashAndFreq with a byte array. In other words, how expensive is the allocation of all these pojos. If it's a significant difference, maybe it's best to get rid of it.

Deliverables

  • Quickly remove HashAndFreq
  • Benchmark L2LshModel
  • If it's a significant difference, consider removing the permutation LSH model, or implementing it in a way that doesn't require allocation of these pojos.

Related Issues

No response

I tried this in cabc5bb

and then benchmarked it in 4a3ff2c

The improvement was negligible:

image

Introducing this would be a breaking API change (removing repetitions from Permutation LSH model), which doesn't seem worthwhile for a negligible performance improvement.