mallocator / Elasticsearch-Exporter

A small script to export data from one Elasticsearch cluster into another.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mappings not updated on target correctly

mpcarl opened this issue · comments

I am trying to copy an index from 1 cluster to another cluster. Using the options file, I get the following output.

mpcarl $: node exporter.js -o meetings.json
Elasticsearch Exporter - Version 1.4.0
Reading source statistics from ElasticSearch
Reading mapping from ElasticSearch
Reading mapping from ElasticSearch
Creating index mapping in target ElasticSearch instance
Creating index mapping in target ElasticSearch instance
Mapping is now ready. Starting with 0 queued hits.
Mapping is now ready. Starting with 0 queued hits.
Host target_host:9200 responded to PUT request on endpoint /meetings with an error:
{"error":"IndexAlreadyExistsException[[meetings] already exists]","status":400}
Processed 200 of 3611 entries (6%) .....

When I look at the mappings at the end of the job, they don't match the source mappings. Everything looks to be just defaults.

mappings: {
Meetings: {
properties: {
created_by: {
type: string
}
date_entered: {
type: string
}

versus the source

created_by: {
type: "string",
index: "not_analyzed",
omit_norms: true,
index_options: "docs"
},
date_end: {
type: "date",
format: "dateOptionalTime"
},
date_entered: {
type: "date",
format: "yyyy-MM-dd HH:mm:ss"
},

I also don't understand why it is trying to update the target twice, but maybe that's a different issue.

Elastic version "0.90.7" on both servers.

commented

which version of elastic search are you using? Can you include your options file?

From the log it seems like there was already an index when the mappings call returned. I'm not sure why it's running the update twice, unless there are multiple indices defined.

Elastic version "0.90.7" on both servers. I am manually deleting the existing index from the target server before running the script.

Config:

{
"sourceHost" : "server1",
"sourcePort" : 443,
"sourceIndex" : "meetings",
"sourceUseSSL" : true,
"sourceAuth" : "user:password",

"targetHost" : "server2",
"targetPort" : 9200,
"targetIndex" : "meetings",
"targetUseSSL" : true,  
"targetAuth" :  "user:password",

"logEnabled" : true,
"insecure" : true

}

commented

I can't reproduce the problem. Without more feedback there is not much I can do right now.

I am happy to provide more feedback. What else do you need to know?

commented

I'm not sure, can you tell me a little more about your setup? How many nodes in the cluster? is there anything special about the index? How many types are in the index? What do you use to tunnel ssl? Where are you running the importer from?

The source cluster has five nodes with nine or so indices on it. I am trying to copy one of those indices to a target cluster that has the same number of nodes. I have tried with an empty target cluster and a cluster with a few indices, though without the same index being copied. There are several analyzer configurations on the index, which are set correctly (e.g. index.analysis.filter.default_ngram_filter.type: nGram).

The index has one type named Meetings (i.e. index name: meetings, type name: Meetings).

SSL is implemented using apache 2.4 which accepts the connections on port 9200 and forwards them to the Elastic cluster listening on port 9201.

I have tried node.js + Elasticsearch-Exporter on Mac and RHEL. Neither of these machines are hosting the elastic cluster nodes. My node version is 0.10.32.

commented

Thanks for the details. Until I can replicate the problem and fix the error I can offer a workaround:

First create the index with the right mapping and then start the export. I know it's a hassle but at least you can continue working.

I'm going to look further into this, but no promise as to how fast I can find a fix.

When the target index already exists the mappings from source target fail to copy over.
Even for a target index that does not exist the mappings are not copied over properly.
At least what I observe is that the source mapping which had multi-fields show up as only string fields on the target index.
I'm using ES 0.90.13.

commented

@ajaydivakaran When the target index exists then that's normal behavior of ElasticSearch. You can't overwrite an existing Index mapping, you only add new types with their individual mapping.

And for the mappings not being copied over properly in general, that's something that's still under investigation. I just haven't had time to deal with the problem so far.

@mallocator Thanks for your reply.

I think I see what is going on now. When the settings and mappings are retrieved from the source server, they come in the format:

{"Meetings":{"properties":{"a...},"settings":{"index.analysis.an...}

However, that is not the correct format to send to the target server.
The mapping section is missing the "mapping" : portion. So the PUT to the target server should look like:

{"mappings":{"Meetings":{"properties":{"a...} } < additional } needed> ,"settings":{"index.analysis.an...}

I added

"sourceType": "Meetings",

to my config file. This might get the mappings sent correctly, but the settings are not sent on create so that mapping fails because of the missing settings.

commented

Yeah there are some unresolved issues with the mapping that I still have to address.

I'm currently in the process of a bigger rewrite amidst a time where I'm unfortunately really busy with other things. So a fix will come, but it'll take time.

commented

With the rewrite this should no longer apply.