sirensolutions / siren-join

[This is the old, single node version for Elasticsearch 2.x, see the latest "Siren Federate" plugin for distributed Elasticsearch 5.x and 6.x capabilities]

Home Page:http://siren.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

support joining on a metafield

paulwilton opened this issue · comments

Hi
Is it possible to join some field from the inner index to the "_id" field (elastic identifier) of the docs in the outer index? rather than a field from the _source object eg:

{
  "bool" : {
    "filter" : {
      "filterjoin" : {
        "_id" : {
          "indices" : ["my-index"],
          "path" : "pathToKeyId",
          "query" : {
            "bool" : {
              "filter" : [
                {"term" : {"someField" : "someValue"}},
              ]
            }
          }
        }
      }
    }
  }
}

It should be possible to join on the _id field. The only drawback is that you cannot use doc_values with it, although it is something that elasticsearch is working on to allow. Therefore, you might need more memory to perform the join.

Please reply if it actually doesn't work! Thanks.

@paulwilton I tried on my end and it doesn't seem possible. We'll work on supporting this.

Hi Stéphane
No problem, I checked also, and have now worked around it, by surfacing a key as a document property in the target index. I had assumed (possibly incorrectly) that it would be more performant using the "_id" property in the outer index, as the elastic index would have an efficient mechanism for lookup on this.

thanks, Paul

@paulwilton after some investigation, the _id field in elasticsearch is not indexed [1] and it is derived from the _uid field.
There is a discussion in elasticsearch to support doc values for the _id field [2]. If this issue is resolved by elasticsearch, then it will be easier for us to support that. Right now, to support joining on a _id field, we would have to add an internal mapping to map _id field to _uid field. An easier fallback solution is, as you suggested, to add explicitly the id as a document field (not indexed but with doc values activated). This will come at the cost of an increase in index size, but the positive side is that by using doc values it will be more heap friendly (the doc values will be cached off heap).

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
[2] elastic/elasticsearch#11887