alexklibisz / elastiknn

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.

Home Page:https://alexklibisz.github.io/elastiknn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Null Pointer Exception when searching by vector Id for an Id that doesn't exist

bcrastnopol opened this issue · comments

Describe the bug
I have a cosine-lsh mapped index, and when run nearest neighbor searches (exact and approximate) by vector id, I get the following error:

  "error" : {
    "root_cause" : [
      {
        "type" : "runtime_exception",
        "reason" : "Failed to retrieve vector at index [my-index] id [1] field [v]"
      }
    ],
    "type" : "runtime_exception",
    "reason" : "Failed to retrieve vector at index [my-index] id [1] field [v]",
    "caused_by" : {
      "type" : "null_pointer_exception",
      "reason" : "Cannot invoke \"java.util.Map.get(Object)\" because the return value of \"org.elasticsearch.action.get.GetResponse.getSourceAsMap()\" is null"
    }
  },
  "status" : 500
}

Expected behavior
Searching for a vector with an id that doesn't exist should not throw a 500 error

Environment (please complete the following information):

  • Elastiknn version: 7.13.1.0 and 7.14.1.0
  • OS: ubuntu linux

To Reproduce
Steps to reproduce the behavior:
1.Create an index

{
 "settings": {
   "index": {
     "number_of_shards": 1,          
     "elastiknn": true               
   }
 }
}
  1. Add a mapping
{
   "properties": {
       "my_vec": {
           "type": "elastiknn_dense_float_vector",
           "elastiknn": {
               "dims": 100,                      
               "model": "lsh",                   
               "similarity": "cosine",             
               "L": 99,                            
               "k": 1                              
           }
       }
   }
}
  1. Search for a vector that doesn't exist by Id
{
   "_source": [
     "vid"
 ], 
     "size": 10,
 "query": {
   
   "elastiknn_nearest_neighbors": {
     "model": "lsh",
     "similarity": "cosine",
     "candidates": 100,
     "field": "v",
     "vec": {
       "index": "my-index",
       "field": "v",
       "id": "1"
     }
   }
 }
}
  1. See error

Additional context
I just want to say that this plugin is incredible! I'm running a very large cluster (100m+ vectors) and I've gotten great results compared to some some other nearest neighbor libraries. Plus, this has the added benefit of incremental index updates! Keep up the great work here!

Thanks! It's probably a simple fix. I'll look into it in the next couple days.

Hi @bcrastnopol , the issue should be resolved in the 7.14.1.1 release. I resolved in #311 and added some regression tests. Feel free to re-open this issue if it's still a problem.

I also opened #312 to look at some related latent issues and/or inconsistencies in exception handling.

Thanks for the kind words about the plugin. If you can, consider submitting a PR to add your use-case to the readme (here). I don't use the plugin in my day-to-day work so it's always neat to hear how it's being used.

Thank you for fixing this - it works great!

We're still evaluating a couple of solutions but I'm advocating for this. If we end up using it I'll find out what I'm allowed to share publicly and submit a PR!