ncb000gt / node-es

NodeJS module for ElasticSearch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Memcached Protocol

cpsubrian opened this issue · comments

First, please consider this more of a conversation starter than an actual 'issue'.

Elastic Search features an optional memcached transport that supports a subset of the REST api.

In some initial, ROUGH, benchmarks the memcached transport handles significantly more throughput than the REST api. I put together a quick bench comparing the two at https://github.com/cpsubrian/modeler-elasticsearch/blob/noclient/bin/bench.sh .

The benchmark uses a wrapper I threw together that uses the bulk of node-elasticsearch's api and just replaces lib/request.js with a memcache version. Ultimately that driver still has to fall-back to REST for HEAD requests (and possibly others that we'll find if we really dive in). You can find it here: https://github.com/cpsubrian/elasticsearch-memcached

Results are:

Please be patient.
{ http_parser: '1.0',
  node: '0.10.15',
  v8: '3.14.5.9',
  ares: '1.9.0-DEV',
  uv: '0.10.13',
  zlib: '1.2.3',
  modules: '11',
  openssl: '1.0.1e' }
Scores: (bigger is better)

elasticsearch-memcached
Raw:
 > 5.228771228771229
 > 4.932067932067932
 > 5.014985014985015
 > 5.1688311688311686
Average (mean) 5.086163836163836

elasticsearch
Raw:
 > 1.4155844155844155
 > 1.4475524475524475
 > 1.4195804195804196
 > 1.4175824175824177
Average (mean) 1.425074925074925

Winner: elasticsearch-memcached
Compared with next highest (elasticsearch), it's:
71.98% faster
3.57 times as fast
0.55 order(s) of magnitude faster
QUITE A BIT FASTER

So clearly this is at least worth pursuing further.

I think the first steps are implementing a couple changes to your module:

  • Allow an alternative request handler to be passed in createClient(options). This will allow elasticsearch-memcached to just implement the request handler, rather than have to 'copy' createClient() also. It would also open up other possible request handlers that use alternative http libraries (or whatever else people think up).
  • Possibly some light refactoring to move processing the actual request options into the request handler.
    • For example, the memcached protocol needs to add ?source={serialized data} to the path for _search requests (searches need to run through get() rather than set()). This would be easier if req.formatParameters() was called inside the request handler.

Anyhow, really just hoping to get your thoughts. I don't want to spin my wheels if you're not interested in these kinds of changes. Thanks!

This is pretty interesting... I'll look through these ideas over the weekend and thank you for the suggestions on the module changes and refactors! Assigning this issue to myself for now...

Now published as 0.3.8 to NPM, closing issue

Excellent! Thanks for the speedy review.