opendistro-for-elasticsearch / k-NN

🆕 A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro.

Home Page:https://opendistro.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to access an element of `knn_vector`?

hppRC opened this issue · comments

Hi, I'm a beginner at ODFE, but I'm searching for a way to access the elements of the knn_vector value individually and use it in a painless script.

I would like to perform a query like this.
(I suppose each data has a parameter DOC_VECTOR of knn_vector type.)

{
    "size": 10,
    "query": {
        "script_score": {
            "query": {
                   "matc_all": {}
                }
            },
            "script": {
                "lang": "painless",
                "params": {
                    "query_vector": [0.1, 0.2]
                },
                "source": "params.query_vector[0] * doc['DOC_VECTOR'][0]"
            }
        }
    }
}

In such a case, I think it's best to use an array. However, I think it is convenient to be able to use knn_vector like an array.
For example, if you want to give special weighting to experiment to see if a particular dimension of embedding has important information.

However, I suppose it is impossible currently because KNNVectorScriptDocValues#get is not supported in (https://github.com/opendistro-for-elasticsearch/k-NN/blob/4423a57eaff5c3f78771ffd3e9f71ec6615a8aec/src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNVectorScriptDocValues.java) .

Then, my questions are below.

  • Is there any other way to access an element of knn_vector in the painless script?
  • If we can't, what are the difficulties of supporting it?

Sorry if I sound rude.
If there is anything I can do, I would like to contribute.

Thank you.

Hi @hppRC ,
As of now, we don't support access to vector directly. The use case we enabled was allowing users to use doc values inside predefined similarity scoring methods starting from odfe 1.13.0. We will take your use case under consideration and see how can we support it. In the meantime, you can try access the value in following way. Please let us know if it didn't not work in your case

{
    "size": 10,
    "query": {
        "script_score": {
            "query": {
                   "match_all": {}
                }
            },
            "script": {
                "lang": "painless",
                "params": {
                    "query_vector": [0.1, 0.2]
                },
                "source": "params.query_vector[0] * params._source['DOC_VECTOR'][0]"
            }
        }
    }
}

Hi @VijayanB ,

Thank you for your quick and polite response!
I performed your suggested query and found that it was what I was looking for.
Now that my problem is resolved, it's okay to close this issue.

Thanks a lot.