jprante / elasticsearch-plugin-bundle

A bundle of useful Elasticsearch plugins

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

_langdetect endpoint missing in ES 2.2?

marbleman opened this issue · comments

With ES 1.7 we used the _langdetect endpoint to verifiy the language of a document prior to indexing it according to the examples from https://github.com/jprante/elasticsearch-langdetect.

Trying the same with ES.2.2 and bundle 2.2.0.1 the example query now returns

curl -XPOST 'localhost:9200/_langdetect?pretty' -d 'Das ist ein Test'
{
"error" : {
"root_cause" : [ {
"type" : "invalid_index_name_exception",
"reason" : "Invalid index name [langdetect], must not start with ''",
"index" : "_langdetect"
} ],
"type" : "invalid_index_name_exception",
"reason" : "Invalid index name [langdetect], must not start with ''",
"index" : "_langdetect"
},
"status" : 400
}

Is the endpoint still available somewhere?

No comments? Hope my question wasn't too birdbrained... ;) However, if so, I would appreciate a hint on what I am missing...

Sorry, I overlooked the issue.

I released 2.2.0.2 with a fix.

Download link of plugin zip file is

https://github.com/jprante/elasticsearch-plugin-bundle/releases/download/2.2.0.2/elasticsearch-plugin-bundle-2.2.0.2-plugin.zip

Thanks a lot for your response! Installed it right away. Unfortunatlly I get an error no matter if execute from sense or from command line:

curl -XPOST 'localhost:9200/_langdetect?pretty' -d 'Das ist ein Test'
{
"error" : {
"root_cause" : [ {
"type" : "illegal_state_exception",
"reason" : "failed to find action [org.xbib.elasticsearch.action.langdetect.LangdetectAction@d8b70e11] to execute"
} ],
"type" : "illegal_state_exception",
"reason" : "failed to find action [org.xbib.elasticsearch.action.langdetect.LangdetectAction@d8b70e11] to execute"
},
"status" : 500
}

OK, that was the reason why I removed the REST action.... I have to investigate how to solve this class loader issue.

Thanks in advance! IMHO _langdetect REST endpoint is quite an important feature since it allows to check the language prior to indexing. Each document can then be sent to the right index having the appopriate analyzers for that language

Attaching the right analyzer is a feature where REST endpoint is not for.

In ES 1.x this was possible by assigning an analyzer path. In ES 2.x this was removed. I will implement multi-field name extension with automatically setting language analyzers https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html#_analyze_multiple_times

Thanks for finding the typo.

This is probably not the right place to discuss some "best practices" (which I would be interested in) but according to some recommendations around the inet we decided to go for seperate indices for each language such as "myindex_de" and "myindex_en" for example. Therefore we have to detect the language prior to indexing... This way we can do searches on "myindex_*" to get results in multiple languages. And we get around all that trouble with mixed languages