jprante / elasticsearch-plugin-bundle

A bundle of useful Elasticsearch plugins

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

langdetect error (500): duplicate of the same language profile, using REST endpoint

marbleman opened this issue · comments

I have noticed a strange error caused by langdetect, I haven't seen on my old 1.7 setup before:
I am using PHP Elasticsearch\Client which uses Guzzle for the HTTP connection (which may or may not be part of the problem):

Everything is fine, if I just have one active thread on the PHP server talking to the ES cluster. When I open a second thread, I randomly see Exceptions is ES like

[2016-03-25 01:21:23,599][ERROR][org.xbib.elasticsearch.module.langdetect.LangdetectService] duplicate of the same language profile: en java.io.IOException: duplicate of the same language profile: en at org.xbib.elasticsearch.module.langdetect.LangdetectService.addProfile(LangdetectService.java:205) at org.xbib.elasticsearch.module.langdetect.LangdetectService.loadProfileFromResource(LangdetectService.java:199) at org.xbib.elasticsearch.module.langdetect.LangdetectService.load(LangdetectService.java:148) at org.xbib.elasticsearch.module.langdetect.LangdetectService.setProfile(LangdetectService.java:223) at org.xbib.elasticsearch.action.langdetect.TransportLangdetectAction.doExecute(TransportLangdetectAction.java:32) at org.xbib.elasticsearch.action.langdetect.TransportLangdetectAction.doExecute(TransportLangdetectAction.java:16) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70) at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:351) at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52) at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.doExecute(BaseRestHandler.java:83) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:351) at org.xbib.elasticsearch.rest.action.langdetect.RestLangdetectAction.handleRequest(RestLangdetectAction.java:30) at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:54) at org.elasticsearch.rest.RestController.executeHandler(RestController.java:207)

The language is different in each log entry and each logentry seems to relale to a different request.
I am using the REST endpoint and I have limited the languages in elasticsearch.yml to about 10 languages.
Before I drill deeper experimenting with combinations of settings and all that time consuming stuff I hope you can give me a hint about the best startpoint of investigation....

Thx in advance!

Looks like a race condition. LangdetectService is not thread safe. I think it will help to synchronize the call to LangdetectService in TransportLangdetectAction.

Thanks for the hint!! However, that kind of change is out of the range of my current possibilities, I am afraid.
AFAIK ES PHP module uses a round robin of all cluster nodes. Probably the race condition comes up when two requests hit the same node at the same time. This would explain the strange random factor.

I'll give it a try to direct each thread to a dedicated cluster node.

Yes, two threads execute on same node is the race condition. I will push a fix today, it is just wrapping the execution of detectAll in a synchronized statement.

Amazing!! Unfortunately I cannot install it:

ERROR: java.lang.IllegalStateException: jar hell!
class: org.apache.lucene.analysis.ar.ArabicAnalyzer$DefaultSetHolder
jar1: /usr/share/elasticsearch/lib/lucene-analyzers-common-5.4.1.jar
jar2: /tmp/1504669576103186/temp_name-206789507/lucene-analyzers-common-5.4.1.jar

Thanks.

My build procedure is broken, as a quick fix, just remove lucene-core-5.4.1.jar and lucene-analyzers-common-5.4.1.jar from plugins/bundle directory...

Thaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaank you so much! Rus like hell but without jar hell now... and multihreaded withou any errors!