alexklibisz / elastiknn

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.

Home Page:https://alexklibisz.github.io/elastiknn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Future of Elastiknn

alexklibisz opened this issue · comments

My current plan is to begin winding down my contributions and support of Elastiknn. I envision the project continuing in "maintenance mode", and I want to give any users a specific heads-up about this. In many ways, this is how I've approached the project for the past year or so. I'm just spelling it out so that expectations are clear.

The project faces several headwinds, not all bad.

Good headwinds:

  1. Elasticsearch and Opensearch are both investing in ANN implementations. After over a year of development, Lucene and Elasticsearch are finally starting to expose an API for ANN search. There's also a growing ecosystem of ANN search solutions, both open source and proprietary. It's great to see the ecosystem growing.
  2. I have other technical interests I'd like to pursue. Elastiknn has been a great platform for learning and experimenting, but I actually haven't implemented any sort of ANN search professionally since 2017, back when the idea for Elastiknn was born. I'm glad I followed through on implementing this, but, all things considered, my time at this point is better spent on other problems.

Other headwinds:

  1. The project has seen almost no outside contribution. There have been many issues and emails in which I respond, "I don't have time to work on this right this moment, but there's a developer-guide.md and I'm happy to review PRs." I can count on one hand the number of times someone has followed up on this. To be very clear, I have had some help doing maintenance, like upgrading ES versions, for which I'm very appreciative.
  2. It's unclear if anyone is even using this in any consequential way. I see some downloads, but only one person has submitted to the list of users in the readme. For better or worse, I get a lot of satisfaction from knowing my effort translates to solved problems. So the lack of feedback makes it hard to find motivation for this effort compared to some other efforts.

There are a few final tasks I'd like to complete to satisfy my own curiosities:

  1. Upgrade to Elasticsearch 8.x (#348). I'll continue reviewing upgrade PRs if other folks make them, but this will be the last major upgrade which I personally do.
  2. Complete and merge a benchmark implementation based on some of the big-ann-benchmarks datasets (#278). Ideally this would also include an apples/apples comparison to Lucene's HNSW implementation. I'm curious how the numbers play out.
  3. Get rid of the Unsafe vector serialization (#263).

Concretely, once the above are done, here's how I see the future of Elastiknn playing out in "maintenance mode":

  1. I'll continue to review and merge version bump PRs. Based on Elastic's historical cadence, I think getting onto 8.x will lead to relative steady-state for about a year or so.
  2. If someone reports a bug, It'll come through my email and I might comment on how it might be fixed. If it's particularly interesting, and scratches an intellectual itch, I might look into it myself.
  3. I won't pursue big feature additions like #197, #279, #298, #323. If someone submits a PR, I'll give it a look. I'll keep a high-standard for the project and won't merge low-quality PRs. It would take some impressive ambition to solve these kinds of problems well.

Really sad to hear this, we're just about to go live with a large implementation of this plugin in our ES cluster. Whilst I can understand the noted negative "headwinds" and appreciate them. Just a note on why I believe your plugin is winning by both allowing for filtering pre vector comparison and also due to a better implementation allowing for aggregations based on a similarity search. Neither of these features is possible on ES8 and open searches implementation is flawed as documented by yourself.

Is this something you are committed to sunsetting?

Also to note I really appreciate your final commitments to a "few final tasks" 👍

Hey @bennimmo , thanks for the kind words, and I'm glad the plugin is working well for you. Please consider adding your company to the users list on the readme.

I updated my original post with a more concrete description of how I see the future of the project looking.

On the filtering and aggregations, I think they could probably just copy my implementation. It uses existing primitives that have been around for over two years. It's in Scala, but easily translatable to Java (w/ a bit more boilerplate). Hopefully the folks at Elastic and Amazon have some grander vision, but also no shame in copying this stuff.

@alexklibisz Alation, Inc is also in process of rollout of Elatiknn. I appreciate all your hard work on this plugin and especially your willingness to be open and clear with your plans. Since we haven't gone live with the related feature I don't want to submit a user's PR, but wanted to make sure you know our appreciation!

I've just created a pull request adding us as a user. I thought I'd done this already sorry for the delay, we actually have just gone live with this feature.