opendistro-for-elasticsearch / k-NN

🆕 A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro.

Home Page:https://opendistro.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make HNSW algorithm parameters mapping parameters

jmazanec15 opened this issue · comments

For index creation, HNSW graphs take a couple different parameters: M, efConstruction, dimension, spaceType, etc. Currently, M, efConstruction, and spaceType are only configurable via index settings. Dimension is configurable via mapping parameters.

There are a few limitations/issues with having M, efConstruction, and spaceType only be configurable via index settings:

  1. We have seen some race conditions where, during mapper build, the index settings have not yet been set for the index, causing M, efConstruction and spaceType to fall back to defaults for that field. For spaceType, this can be a serious bug because it will default to l2 space as opposed to cosinesimil space in some cases.
  2. It prevents users from having multiple fields in the same index with different values for M, efConstruction, and spaceType. For instance, it is not possible to have a cosinesimil field and l2 field in the same index.

This is a feature request to make M, efConstruction and spaceType mapping parameters and to start to deprecate M, efConstruction and spaceType as index settings. In order to maintain backwards compatibility, it makes sense in KNNVectorFieldMapper to first check if the mapping parameters have been set, and, if not, fall back to reading index settings (as is currently done).

Please leave comments on this issue with thoughts/questions/concerns.

Thanks for looking into it!
(btw, I'd label it as a bug, not enhancement)

Is there some workaround until this is fixed?
Some memory gate? Splitting the index creation to couple of steps perhaps?

Hi @ofirnk , we are still looking into root cause of https://discuss.opendistrocommunity.dev/t/cosine-similarity-formula/4390/12. We haven't confirmed that this is causing that issue, yet. We are currently working to reproduce the issue. Once we are able to, we will create a separate issue that contains bug and workaround.

On "We have seen some race conditions where, during mapper build, the index settings have not yet been set for the index, causing M, efConstruction and spaceType to fall back to defaults for that field. For spaceType, this can be a serious bug because it will default to l2 space as opposed to cosinesimil space in some cases." Sorry, my wording here was not clear. We saw this happen in #239 and it may be the case for https://discuss.opendistrocommunity.dev/t/cosine-similarity-formula/4390/12, but have not yet confirmed.

Regardless, it would be an improvement to set M, efConstruction, spaceType as mapping parameters, as opposed to index settings. That is why I labeled as enhancement instead of bug.