opendistro-for-elasticsearch / k-NN

🆕 A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro.

Home Page:https://opendistro.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to retrieve a knn_vector field?

nhatnambui opened this issue · comments

I created an index with a knn_vector field using the mapping below

{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 2
      }
    }
  }
}

I wanted to use my_vector value of another doc as query_value in the below knn search query

{
 "size": 4,
 "query": {
   "script_score": {
     "query": {
       "match_all": {}
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "my_vector",
         "query_value": // I want to use my_vector value of another doc here,
         "space_type": "cosinesimil"
       }
     }
   }
 }
}

According to the issue #341, I guess currently it's impossible to achieve that in just one request.

So in the first request, I have to retrieve my_vector for a specific doc then use it as query_value in the second request.

The problem is when retrieving my_vector using docvalue_fields I got the error

"reason": {
  "type": "unsupported_operation_exception",
  "reason": "knn vector field 'recommender_vector' doesn't support sorting"
}

I also tried "stored_fields" but it didn't work.

How can I retrieve value of a knn_vector field?

A workaround is to create and use a duplicate field with float type like below, but I think it's not optimal.

{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 2
      },
      "my_vector_duplicate": {
        "type": float
      }
    }
  }
}

@nhatnambui
I believe that you would like to know how to retrieve knn_vector value. I hope the following example will help in your use case. You can update query to filter the document based on your scenario.

GET my-knn-index-1/_search
    {
        "query" : {
             "exists": {
                  "field": "my_vector"
              }
        },
        "script_fields" : {
            "test1" : {
                "script" : "params._source['my_vector']"
            }
        }
    }

Output will look like

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-knn-index-1",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 1.0,
        "fields" : {
          "test1" : [
            1.5,
            5.5,
            4.5,
            6.4
          ]
        }
      }
    ]
  }
}

@VijayanB the query only works when _source is enabled, but in my case _source is disabled.

{
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 2
      },
      "my_vector_duplicate": {
        "type": float
      }
    }
  }
}

Sorry for missing that info in my first post.

@nhatnambui Unfortunately it doesn't work if source enabled is set to false.

Hi @VijayanB !
Can you please explain why?