opendistro-for-elasticsearch / k-NN

🆕 A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro.

Home Page:https://opendistro.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provide possibility to unset value for vector field(reset)

Hronom opened this issue · comments

We need a possibility to unset vector for the document, unfortunately, unlike other types: we cannot set null, since we get error:

                "error": {
                    "type": "mapper_parsing_exception",
                    "reason": "failed to parse field [my_vector2] of type [knn_vector] in document with id '10'. Preview of field's value: 'null'",
                    "caused_by": {
                        "type": "illegal_argument_exception",
                        "reason": "Vector dimension mismatch. Expected: 4, Given: 0"
                    }
                }

Thanks for pointing this out. We will consider this as a feature request and prioritize.

I've opened this git to ask for the same feature. lol

Just to add more context: I wan't to free disk usage / memory pressure eliminating just the embedding field and setting the status of the document as False.

Do we have any other way to achieve this until this feature isn't implemented?

Thanks!

Hi @marcoaleixo @Hronom ,
Like @vamshin mentioned, we will work on allowing setting null value to the knn field vector. In the meantime, have you tried removing the field from the document using update api?

Example:
Get document 4

curl "localhost:9200/myindex/_doc/4?pretty"
{
  "_index" : "myindex",
  "_type" : "_doc",
  "_id" : "4",
  "_version" : 3,
  "_seq_no" : 7,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "my_dense_vector" : [
      10,
      10
    ],
    "color" : "BLUE"
  }
}

Here, i am removing the my_dense_vector field from a single document using update api.

curl -X POST "localhost:9200/myindex/_update/4?pretty" -H 'Content-Type: application/json' -d'
{
  "script" : "ctx._source.remove(\"my_dense_vector\")"
}'

Note: You can use update by query to remove the field by checking whether the field exists or not first.

When i do get on doc 4 after remove, i don't see the doc value.

curl "localhost:9200/myindex/_doc/4?pretty"
{
  "_index" : "myindex",
  "_type" : "_doc",
  "_id" : "4",
  "_version" : 2,
  "_seq_no" : 6,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "color" : "BLUE"
  }
}

Please let us know if this helps till we enable setting null value.

Thanks, waiting for the fix.

Unfortunately your proposal not fits our workflow, since we not use script based updates. But I believe it will work for someone else.