ncb000gt / node-es

NodeJS module for ElasticSearch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Only half the queries get executed (exactly half)

mtimofiiv opened this issue · comments

commented

Hello, I am sending a bulk set of queries via the multiSearch() method. But the response does not match what I ask for.

When I do 46 queries, I get back 23. When I do 76, I get back 38. When I do 64, I get 32.

More specifically, I pass in an array with say 76 query objects in it, and I get back an array of 38 properly formatted results. There are no errors anywhere I can see.

I have tried sending with explicitly setting the index and without, and each time I get the same result.

Any idea why this would be happening?

Can you provide a code example?

commented

Certainly! Here's a function that does it. I initialise the module with the proper ES URI params and with a single index upon which I am searching, and run my bulk query set through here:

const url = require('url')
const ES_URI = url.parse(process.env.ELASTICSEARCH_URI)

const index = {
  _index: 'mgm_events',
  _type: 'event'
};

const config = {
  server: {
    port: ES_URI.port,
    host: ES_URI.hostname,
    secure: ES_URI.protocol.indexOf('https') > -1
  }
};

if (ES_URI.auth) config.server.auth = ES_URI.auth;

const mappings = {
  created_at: { type: 'date' }
};

const es = require('es')(Object.assign(config, index));

es.indices.mappings(Object.assign(index, mappings));

function runQuerySet(querySet) {
  return new Promise((resolve, reject) => {
    es.multiSearch({}, querySet, (err, bulkResultSet) => {
      if (err) return reject(err)

      /*
        Some logic here with processing the results, but for brevity's sake
        we will just return the counts cause that's why we're here...
      */

      return resolve({
        queryLength: querySet.length,
        resultLength: bulkResultSet.responses.length
      })
    })
  })
}

runQuerySet(queries).then(result => {
  console.log(result) // ends up being { queryLength: 4, resultLength: 2 }
})

Here are 2 sample queries I am reproducing the error with:

[
  { query: { bool: { must: [ { match: { name: 'create.offer' } } ] } } },
  { query: { bool: { must: [ { match: { name: 'landing' } } ] } } }
]

And the result (bulkResultSet) set ends up like this:

{ responses: [ { took: 29, timed_out: false, _shards: [Object], hits: [Object] } ] }
// dummy [Object]s shown, but that is not important

The expected response here would be 2 items in the responses array, one for each query. I only run 2 here but as I said earlier, it is always half the requested amount (so 26 requested queries gives me 13 results).

commented

I figured it out.

It has to do with the fact that in the docs, it looks like all their examples of multiget all have a header and in core.js we see this comment where there are only queries. Well, sure enough, the payload of the request does not set a header, so when ES receives it, it interprets half the requests as headers instead of queries.

So I guess my question would be this - should this be documented (and so people can accordingly structure the queries parameter to include headers) or should this method take an extra argument?

Let me know, I could do a PR for either.

Hi @mtimofiiv - yes, that is correct - there needs to be a "header" prior to each query that includes things like the index and searchType.

If the payload example from above is altered to look as follows, does the query result in the correct number of results?

[
  { },
  { query: { bool: { must: [ { match: { name: 'create.offer' } } ] } } },
  { },
  { query: { bool: { must: [ { match: { name: 'landing' } } ] } } }
]
commented

Ok cool. Would it be nice then to note this in the docs for future users? I can submit a PR to specify this in the readme.

@mtimofiiv - sorry for taking so long to respond... a PR would be awesome!

Closing for now... no modifications were added to the documentation, but a a really great plan would be to update links in the readme to the elastic.co docs for each function.