Specify ElasticSearch index template

Question

Specify ElasticSearch index template

chschs opened this issue 10 years ago · comments

I'm using fluentd with the in_syslog plugin and elasticsearch plugin to get syslog into elasticsearch, with a kibana frontend.

One of the problems I'm having though, is that the fields are indexed in elasticsearch so when I add a terms dashboard in kibana to give me, say, the top-10 hostnames, hostnames with dashes in them are broken up. so mysql-test-01 would come across as three hostnames: mysql, test, and 01.

Logstash got around this issue by making a "raw" version of several fields that is set to not-analyzed upon creation, so that you can run your dashboards against that instead.

More information here: http://www.elasticsearch.org/blog/logstash-1-3-1-released/

With syslog messages going into ES with this plugin, I'm finding that I'd like to have a "raw" or non-analyzed host (hostname) field and ident field (gives me the application). Unfortunately right now both of those fields are analyzed and it's messing with our dashboards.

Lucas Fontes · Answer 1 · Wed Mar 26 2014 22:55:20 GMT+0800 (China Standard Time)

Hey @chschs, have you tried adding a mapping template to change the index settings ?

example

{
  "mappings": {
    "_default_": {
      "_all": { "enabled": false }, 
      "_source": { "compress": true },
      "properties" : {
        "event_data": { "type": "object", "store": "no" },
        "@fields": { "type": "object", "dynamic": true, "path": "full" }, 
        "@message": { "type": "string", "index": "analyzed" },
        "@source": { "type": "string", "index": "not_analyzed" },
        "@source_host": { "type": "string", "index": "not_analyzed" },
        "@source_path": { "type": "string", "index": "not_analyzed" },
        "@tags": { "type": "string", "index": "not_analyzed" },
        "@timestamp": { "type": "date", "index": "not_analyzed" },
        "@type": { "type": "string", "index": "not_analyzed" }    
      }   
    }
  },
  "settings": {
    "index.cache.field.type" : "soft",
    "index.refresh_interval": "5s",
    "index.store.compress.stored": true,
    "index.number_of_shards": "3", 
    "index.query.default_field": "querystring", 
    "index.routing.allocation.total_shards_per_node": "2"
  }, 
  "template": "logstash-*"
}

This will be used every time a new index with the 'logstash-*' pattern is created

Andrew Turner · Answer 2 · Tue Aug 05 2014 23:44:46 GMT+0800 (China Standard Time)

+1 to make this part of this plugin. While we can manually modify the mapping, why require that overhead in an application to update this when Fluent-logstash is already creating the original mapping.

Andrew Turner · Answer 3 · Wed Aug 06 2014 00:03:28 GMT+0800 (China Standard Time)

Looking at the code - Fluent is not actually creating or modifying the index and merely writing to the current index. This would have to detect that a new index is being created and subsequently call to update the mapping.

Now I understand what you mean by indices templates. Perhaps worth adding this to the Readme

Peter Vernigorov · Answer 4 · Wed Aug 06 2014 00:28:33 GMT+0800 (China Standard Time)

@ajturner good point, added 139184e

James Badger · Answer 5 · Wed Sep 03 2014 06:32:51 GMT+0800 (China Standard Time)

I was able to use a custom index based off the one logstash uses to auto-generate .raw versions of fields.

I started by deleting the current day's index ($ curl -XDELETE localhost:9200/logstash-2014.09.02) then using that curl PUT command to set the defaults for the index. I then restarted fluentd and the raw fields were available. You can check if the settings are sticking in ElasticSearch:

$ curl localhost:9200/logstash-2014.09.02/_mapping?pretty

Stan Hu · Answer 6 · Tue May 05 2015 06:57:48 GMT+0800 (China Standard Time)

+1 for making this built into the plugin.

This is the template used by logstash:

https://github.com/logstash-plugins/logstash-output-elasticsearch/blob/master/lib/logstash/outputs/elasticsearch/elasticsearch-template.json

Aleksandr Kuznetsov · Answer 7 · Fri Feb 12 2016 13:40:25 GMT+0800 (China Standard Time)

There is a valuable reason to make index template support built into the plugin: in a containerized environment we do not know when elasticsearch container will start and we have to PUT index template to it before fluentd will send data. There are some possible workarounds but every of them looks really ugly. So +1 for making built in support of index templates. @pitr
Since version 2.x elasticsearch supports index templates creation only via API

Stanislav · Answer 8 · Sat Feb 13 2016 02:31:17 GMT+0800 (China Standard Time)

+1 for making built in support of index templates.

Peter Vernigorov · Answer 9 · Sat Feb 13 2016 12:53:23 GMT+0800 (China Standard Time)

To implement this behaviour, this gem would need to do the following additional work before writing a record to elasticsearch:

Check if the index exists
If not, create the index and the mapping

Seems like that might impact the performance. What do you think @aleks-v-k @ssergiienko? What about deleting old indices?

Can you provide more information about how you run ElasticSearch in "containerized environment"? Is this something Cloudlinux is working on (I'm not familiar with their offerings)?

Stan Hu · Answer 10 · Sat Feb 13 2016 14:31:45 GMT+0800 (China Standard Time)

The gem would only need to write the index templates once at startup since a wildcard match could be used. As seen in the Logstash template, just adding something like logstash-* should take care of things.

Peter Vernigorov · Answer 11 · Wed Feb 17 2016 04:21:17 GMT+0800 (China Standard Time)

Hmm, sounds like a good idea then

Aleksandr Kuznetsov · Answer 12 · Wed Feb 17 2016 16:49:11 GMT+0800 (China Standard Time)

@pitr Thanks for reply and excuse me for late answer,

To implement this behaviour, this gem would need to do the following additional work before writing a record to elasticsearch

I suppose there is some sort of initialization point in fluent plugin system, so it may be done there: not for each exact index, but for a group of indexes as @stanhu had mentioned.

Can you provide more information about how you run ElasticSearch in "containerized environment"? Is this something Cloudlinux is working on (I'm not familiar with their offerings)?

It is a group of hosts, every of them runs docker daemon and has a pair of fluentd&elasticsearch containers to collect and access logs. It is not for CloudLinux OS, It is a part of Kuberdock project which uses kubernetes. We control configuration of elasticsearch and fluentd containers by our own docker images, but not manually start and stop them. They automatically run on every host added into a kuberdock cluster.

Marcus Morris · Answer 13 · Sat Apr 16 2016 02:05:20 GMT+0800 (China Standard Time)

would love to see this implemented. Using the tag log-opt in docker with {{.ImageName}} results in tags with -'s and :'s. I need to set the tag field to not_analyzed so I can properly search for docker images, but having to do it through the ES API goes against my want to keep everything in source control.

Francis Chuang · Answer 14 · Mon Jul 04 2016 14:39:21 GMT+0800 (China Standard Time)

Any chance of this happening? Would love to contribute, but don't have enough time to learn Ruby at the moment.

Andrew Erickson · Answer 15 · Sat Aug 27 2016 06:05:43 GMT+0800 (China Standard Time)

I've implemented this and it's working for me.

#194

Peter Vernigorov · Answer 16 · Tue Sep 13 2016 06:49:49 GMT+0800 (China Standard Time)

implemented with #194 thanks @aerickson