Baseform: memory optimization
ThaDafinser opened this issue · comments
When i use the baseform plugin for some (> 1.000.000) documents, i'm getting this error
[2017-04-06T07:28:07,712][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [ultimate-1] fatal error in thread [elasticsearch[ultimate-1][clusterService#updateTask][T#1]], exiting
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236) ~[?:1.8.0_121]
at org.xbib.elasticsearch.common.fsa.FSABuilder.expandBuffers(FSABuilder.java:468) ~[?:?]
at org.xbib.elasticsearch.common.fsa.FSABuilder.serialize(FSABuilder.java:418) ~[?:?]
at org.xbib.elasticsearch.common.fsa.FSABuilder.freezeState(FSABuilder.java:352) ~[?:?]
at org.xbib.elasticsearch.common.fsa.FSABuilder.add(FSABuilder.java:204) ~[?:?]
at org.xbib.elasticsearch.common.fsa.Dictionary.loadLines(Dictionary.java:43) ~[?:?]
at org.xbib.elasticsearch.index.analysis.baseform.BaseformTokenFilterFactory.createDictionary(BaseformTokenFilterFactory.java:39) ~[?:?]
at org.xbib.elasticsearch.index.analysis.baseform.BaseformTokenFilterFactory.<init>(BaseformTokenFilterFactory.java:27) ~[?:?]
at org.xbib.elasticsearch.plugin.bundle.BundlePlugin$$Lambda$379/386311625.get(Unknown Source) ~[?:?]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:361) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:171) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:155) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.IndexService.<init>(IndexService.java:145) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:363) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:427) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:392) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$1.execute(MetaDataCreateIndexService.java:364) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cluster.service.ClusterService.executeTasks(ClusterService.java:679) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cluster.service.ClusterService.calculateTaskOutputs(ClusterService.java:658) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:617) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cluster.service.ClusterService$UpdateTask.run(ClusterService.java:1117) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:544) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:238) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:201) ~[elasticsearch-5.3.0.jar:5.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Thanks for the report.
It's not a leak but the FSA is quite memory hungry when an index is created, the way it is implemented. Will be investigated to reduce memory size.
Without using the baseform (removed also the defined filter), it still seems to be a memory problem.
@jprante i only created in toal 24MB of indices/documents, but JVM is full and Kibana goes again to timeouts.
@jprante i'm sadly no java geek, but i found at the ES repo this approach for Hunspell
. They use a service, so the dictionary is only loaded once.
Maybe this would ge a good idea?
https://github.com/elastic/elasticsearch/blob/ee802ad63c0f21d697a5095dd05dc6f94626ee4d/core/src/main/java/org/elasticsearch/index/analysis/HunspellTokenFilterFactory.java#L44
https://github.com/elastic/elasticsearch/blob/ee802ad63c0f21d697a5095dd05dc6f94626ee4d/core/src/main/java/org/elasticsearch/indices/analysis/HunspellService.java