mallocator / Elasticsearch-Exporter

A small script to export data from one Elasticsearch cluster into another.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

es_rejected_execution_exception on import from file

kikulikov opened this issue · comments

I try to import data from the file to ES database.

But eexporter starts to error as I run it.

Waiting for mapping on target host to be ready, queue length 8512
Mapping is now ready. Starting with 8512 queued hits.
{"index":{"_index":".kibana","_type":"config","_id":"4.1.1","status":429,"error":{"type":"es_rejected_execution_exception","reason":"rejected execution of org.elasticsearch.transport.TransportService$4@54ae9874 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@14a7948d[Running, pool size = 2, active threads = 2, queued tasks = 50, completed tasks = 16304]]"}}}

I've tried to change ES indices configuration with

curl -s "xxx:9200/_settings" -XPUT -d '{ "index" : { "number_of_replicas" : 0 } }'
curl -s "xxx:9200/_settings" -XPUT -d '{ "index" : { "refresh_interval" :  -1} }'

but it doesn't seem to help a lot.

I'm using ES 2.4.1, Elasticsearch Exporter - Version 1.4.0

Is there any way to add a delay to the import so ES had time to index the data?

Thank you in advance.

commented

Version 1.4.0 doesn't have a timeout functionality. You can increase the number of retries though (--errorsAllowed or -e) and just keep trying to import data that way.

I assume you're still handling huge documents which is why your target cluster can't keep up. You could also try increasing the number of threads for this import to increase the queue length. If you have a load balancer set up for your cluster than using that instead of an individual node is another way to improve performance. Finally you might want to think about splitting up your cluster into master and data nodes.

I don't know what your requirements are in the end to host the data, but consider that the importer is also a kind of benchmark to check how much your cluster can take. Maybe you're not expecting anywhere near the throughput in production that you want for the import, but this at least can give you an indicator that maybe the current configuration can be tweaked (although you should still run proper load testing if that is the case).