jprante / elasticsearch-plugin-bundle

A bundle of useful Elasticsearch plugins

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Baseform: memory optimization

ThaDafinser opened this issue · comments

When i use the baseform plugin for some (> 1.000.000) documents, i'm getting this error

[2017-04-06T07:28:07,712][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [ultimate-1] fatal error in thread [elasticsearch[ultimate-1][clusterService#updateTask][T#1]], exiting
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:3236) ~[?:1.8.0_121]
	at org.xbib.elasticsearch.common.fsa.FSABuilder.expandBuffers(FSABuilder.java:468) ~[?:?]
	at org.xbib.elasticsearch.common.fsa.FSABuilder.serialize(FSABuilder.java:418) ~[?:?]
	at org.xbib.elasticsearch.common.fsa.FSABuilder.freezeState(FSABuilder.java:352) ~[?:?]
	at org.xbib.elasticsearch.common.fsa.FSABuilder.add(FSABuilder.java:204) ~[?:?]
	at org.xbib.elasticsearch.common.fsa.Dictionary.loadLines(Dictionary.java:43) ~[?:?]
	at org.xbib.elasticsearch.index.analysis.baseform.BaseformTokenFilterFactory.createDictionary(BaseformTokenFilterFactory.java:39) ~[?:?]
	at org.xbib.elasticsearch.index.analysis.baseform.BaseformTokenFilterFactory.<init>(BaseformTokenFilterFactory.java:27) ~[?:?]
	at org.xbib.elasticsearch.plugin.bundle.BundlePlugin$$Lambda$379/386311625.get(Unknown Source) ~[?:?]
	at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:361) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:171) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:155) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.index.IndexService.<init>(IndexService.java:145) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:363) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:427) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:392) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$1.execute(MetaDataCreateIndexService.java:364) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.service.ClusterService.executeTasks(ClusterService.java:679) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.service.ClusterService.calculateTaskOutputs(ClusterService.java:658) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:617) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.service.ClusterService$UpdateTask.run(ClusterService.java:1117) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:544) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:238) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:201) ~[elasticsearch-5.3.0.jar:5.3.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_121]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_121]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

Thanks for the report.

It's not a leak but the FSA is quite memory hungry when an index is created, the way it is implemented. Will be investigated to reduce memory size.

I just tried now building a lot of indices with the settings below.

Even without using the filter explicit, the memory seems to be required. (got the same exception)

//update
i'm going to recreate now all indices without the baseform filter defined and watch if then the plugin doesn't crash ES.

image

Without using the baseform (removed also the defined filter), it still seems to be a memory problem.

@jprante i only created in toal 24MB of indices/documents, but JVM is full and Kibana goes again to timeouts.

image
image

After disabling the whole plugin, the JVM memory usage is stable.

Quick idea: load FSA with ES startup? (only once)

image