spotify / voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.

Home Page:https://spotify.github.io/voyager/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Metadata Filter Capability

krpapadopoulos opened this issue · comments

Could vectors be added with metadata to support metadata filtered ANN search?

Curious how this may be handled with the existing implementation other than creating indices for different metadata categories?

Hey @krpapadopoulos, sorry about losing track of this issue -- we're setting up some structure so that we respond to things in a more timely manner going forward.
We have definitely discussed implementing metadata filter support but it's a pretty large undertaking that we unfortunately don't have time to work on at the moment. We are definitely open to contribution of this feature if you or anyone else would like to take it on!

I'm currently working on an implementation of StringIndex in core which will hopefully lay some groundwork for metadata support.

When discussing how we might implement this, we proposed a solution along the following lines:

  • Metadata associated with each item would be passed in a parallel array to the items during addItems, or added to an item at a later time with a dedicated message
  • During query time, the algorithm would take any metadata filter values and skip over any items in the deepest index level which don't satisfy the filter criteria (Note: this will have runtime performance impact due to passing over values)
  • Metadata/labels would likely be stored in the index Metadata in the core library
  • These metadata categories would get written out to the index file as binary POD and then loaded during index load

Happy to discuss this more with anyone willing to take it on!