danpaz / bodybuilder

An elasticsearch query body builder :muscle:

Home Page:http://bodybuilder.js.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to filter some values and show the rest of the field values?

vinodkrishnanr opened this issue · comments

Hi

So we have

EDIT:

Apologies, I should have explained the scenario completely.

We have to apply field level filter and value level filter on ES data.

For field level filter, filter on below fields

field1, field2.

For value level filter filter on below field values. we have Include values and exclude values filter.

Include value level filter - Only include the mentioned values from the input. If include filter is null then inlcude all values of all fields in the field level filter.

includefilter= 
[
[field1, value 1]
]

The include filter above means, filter only value 1 for field1, but show all other field values in the field level filter i.e. all values of field 2.

Exclude value level filter - Only exclude the mentioned values from the input. If exclude filter is null, ignore exclude and include all values of all fields in the field level filter.

excludefilter= 
[
[field2, value 2],
]

The exclude filter above in addition to the include filter means, exclude value 2 for field2, but show all other field values in the field level filter i.e. all values of field 2. except value 2

ES Data:

Results from the elasticsearch as below with actual data

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 28,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "test",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi5OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "value1",
                    "field2": "value1"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            },
             {
                "_index": "test",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi6OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "fair",
                   "field2": "value2"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            },
             {
                "_index": "test",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi7OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "square",
                    "field2": "value3"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                },
               {
                "_index": "test",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi7OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "square",
                    "field2": "value4"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            }
        ]
    }
}

so for the field level filter, we use
.query('query_string', {"query" : 'value1', "fields" : 'field1,field2})

Include value level filter (filtering only certain values on a field), we use

.orFilter('bool', inc => 
    {
      for(i in includefilter)
      {
        if(includefilter[i][1])
        {
          val = includefilter[i][1].split(',')
          return inc.filter('terms', includefilter[i][0], val)
        }
      }

Exclude value level filter (Exclude values on a field), we use

 .notFilter('bool', exc => 
    {
      for(i in excludefilter)
      {
        if(excludefilter[i][1])
        {        
          val = excludefilter[i][1].split(',')
          return exc.filter('terms', excludefilter[i][0], val)
        }
      }
    })

Final bodybuilder query

bodybuilder()
 .query('query_string', {"query" : 'value1', "fields" : 'field1})
    .orFilter('bool', inc => 
    {
      for(i in includefilter)
      {
        if(includefilter[i][1])
        {
          val = includefilter[i][1].split(',')
          return inc.filter('terms', includefilter[i][0], val)
        }
      }
    .notFilter('bool', exc => 
    {
      for(i in excludefilter)
      {
        if(excludefilter[i][1])
        {        
          val = excludefilter[i][1].split(',')
          return exc.filter('terms', excludefilter[i][0], val)
        }
      }
    })
 .build()

So with all the filters applied expected result would be

field 1, value1
field 2, value1
field 2, value3
field 2, value4

Expected ES result after filter

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 28,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "test",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi5OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "value1",
                    "field2": "value1"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            },
             {
                "_index": "test",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi7OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "square",
                    "field2": "value3"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                },
               {
                "_index": "test",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi7OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "square",
                    "field2": "value4"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            }
        ]
    }
}

I think it would be easier if you provide some sample ES data. The issue is a bit hard to follow.

Hi @ferronrsmith,

Sorry, I have now edited my description to reflect the scenario. Thank you for the response.

I get what you're trying to do, but it still doesn't feel like the data (input) matches your expected output. The issue should have input params, code and expected code.

Not seeing the input that would yield this output ...

field 1, value1
field 2, value1
field 2, value3
field 2, value4

ES Data is the input, Expected ES result is the output. Filter from the user is the include and exclude.

When I say input, I mean the input being passed to bodybuilder. Full filter listing so that i can replicate and test

You can unsubscribe via email.

@ferronrsmith

I think its a little hard to explain. But, just realised that the filter and the query should be for the fields of same index? if there are multiple indices, then the filter wouldn't work. Any idea how I can write a query with filter for multiple indices. for example if i have

index = [Index1, Index2, Index3]

then I would like the bodybuilder to work for each index iteratively. Is this possible?

yea i understand that part, the part i don't get is when you say ...

field 1, value1
field 2, value1
field 2, value3
field 2, value4

... I don't see the input provided. I don't know which is part of the include vs exclude filter. understand ?

Is it

includefilter= 
[
[field1, value1],
[field2, value1],
[field2, value3],
[field2, value4]
]

???

@ferronrsmith

For now lets ignore exclude filter so we can keep this simple.

We now have 2 indices, test1 and test2 (shown in ESData below)

Include Filter is

includefilter= 
[
[field1, value 1]
]

Apply this on ESData

ESData:

 "hits": [
            {
                "_index": "test1",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi5OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "value1",
                    "field2": "value1"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            },
             {
                "_index": "test1",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi5OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "square",
                    "field2": "value1"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            },
             {
                "_index": "test1",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi6OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "fair",
                   "field2": "value2"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            },
              {
                "_index": "test2",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi6OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "fair",
                   "field2": "value2"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            },

Expected Output after applying the filter

 "hits": [
            {
                "_index": "test1",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi5OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "value1",
                    "field2": "value1"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            },
              {
                "_index": "test2",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi6OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "fair",
                   "field2": "value2"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            },

Note that the 2nd and 3rd record is not in the output because the include filter has [field1, value 1] i.e. that is the first record in ESData. Any other field1 value (i.e. index text1->field1->'fair' and index text1->field1->'square') should not be in the output.

So your expected output is invalid. Shouldn't just be. In your scenario it seems the filter is only applied to test1 and not test2

 "hits": [
            {
                "_index": "test1",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi5OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "value1",
                    "field2": "value1"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            },

@ferronrsmith

I think its a little hard to explain. But, just realised that the filter and the query should be for the fields of same index? if there are multiple indices, then the filter wouldn't work. Any idea how I can write a query with filter for multiple indices. for example if i have

index = [Index1, Index2, Index3]

then I would like the bodybuilder to work for each index iteratively. Is this possible?

This is a really a body builder thing, but when you're passing it to es library or a some custom code, you're allowed to specify multiple indices.

This what i did (below is the query generated from bodybuilder)

curl --location --request POST 'http://localhost:9200/test*/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "should": [
            {
              "bool": {
                "filter": {
                  "terms": {
                    "field1": [
                      "value1"
                    ]
                  }
                }
              }
            }
          ]
        }
      },
      "must": {
        "query_string": {
          "query": "value1",
          "fields": ["field1"]
        }
      }
    }
  }
}'

From ES : https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html

So your expected output is invalid. Shouldn't just be. In your scenario it seems the filter is only applied to test1 and not test2

 "hits": [
            {
                "_index": "test1",
                "_type": "_doc",
                "_id": "86Pz1XMBuVm1OXi5OK_k",
                "_score": 1.0,
                "_source": {
                    "archived": "false",
                    "field1": "value1",
                    "field2": "value1"
                    "last_modified_date": "2020-08-10T01:19:13.007Z"
                }
            },

Yes, the filter is only applied to field1-value1 in test1, then doesn't that mean, i should still be able to see all the records from test2 because the query_string includes the fields from test2? if not, how can i possibly achieve this?

@ferronrsmith

Please see the below elasticsearch query which is exactly what i want to achieve, it works as intended.

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "filter": [
              {
                "terms": {
                  "field1": [
                    "value1"
                  ]
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "query_string": {
                  "query": "*fair*",
                  "fields": [
                     "field1",
                        "field2"
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}

This produces filtered results of field 1 with value 1 and field 2 with the query fair.

Your query is fine, I just think you need to adjust the logic that query's the indices, to query multiple.

That code isn't shown, but I'm assuming you know what I'm talking about