uken / fluent-plugin-elasticsearch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Specify ElasticSearch index template

chschs opened this issue · comments

I'm using fluentd with the in_syslog plugin and elasticsearch plugin to get syslog into elasticsearch, with a kibana frontend.

One of the problems I'm having though, is that the fields are indexed in elasticsearch so when I add a terms dashboard in kibana to give me, say, the top-10 hostnames, hostnames with dashes in them are broken up. so mysql-test-01 would come across as three hostnames: mysql, test, and 01.

Logstash got around this issue by making a "raw" version of several fields that is set to not-analyzed upon creation, so that you can run your dashboards against that instead.

More information here: http://www.elasticsearch.org/blog/logstash-1-3-1-released/

With syslog messages going into ES with this plugin, I'm finding that I'd like to have a "raw" or non-analyzed host (hostname) field and ident field (gives me the application). Unfortunately right now both of those fields are analyzed and it's messing with our dashboards.

Hey @chschs, have you tried adding a mapping template to change the index settings ?

example

{
  "mappings": {
    "_default_": {
      "_all": { "enabled": false }, 
      "_source": { "compress": true },
      "properties" : {
        "event_data": { "type": "object", "store": "no" },
        "@fields": { "type": "object", "dynamic": true, "path": "full" }, 
        "@message": { "type": "string", "index": "analyzed" },
        "@source": { "type": "string", "index": "not_analyzed" },
        "@source_host": { "type": "string", "index": "not_analyzed" },
        "@source_path": { "type": "string", "index": "not_analyzed" },
        "@tags": { "type": "string", "index": "not_analyzed" },
        "@timestamp": { "type": "date", "index": "not_analyzed" },
        "@type": { "type": "string", "index": "not_analyzed" }    
      }   
    }
  },
  "settings": {
    "index.cache.field.type" : "soft",
    "index.refresh_interval": "5s",
    "index.store.compress.stored": true,
    "index.number_of_shards": "3", 
    "index.query.default_field": "querystring", 
    "index.routing.allocation.total_shards_per_node": "2"
  }, 
  "template": "logstash-*"
}

This will be used every time a new index with the 'logstash-*' pattern is created

+1 to make this part of this plugin. While we can manually modify the mapping, why require that overhead in an application to update this when Fluent-logstash is already creating the original mapping.

Looking at the code - Fluent is not actually creating or modifying the index and merely writing to the current index. This would have to detect that a new index is being created and subsequently call to update the mapping.

Now I understand what you mean by indices templates. Perhaps worth adding this to the Readme

@ajturner good point, added 139184e

I was able to use a custom index based off the one logstash uses to auto-generate .raw versions of fields.

I started by deleting the current day's index ($ curl -XDELETE localhost:9200/logstash-2014.09.02) then using that curl PUT command to set the defaults for the index. I then restarted fluentd and the raw fields were available. You can check if the settings are sticking in ElasticSearch:

$ curl localhost:9200/logstash-2014.09.02/_mapping?pretty

There is a valuable reason to make index template support built into the plugin: in a containerized environment we do not know when elasticsearch container will start and we have to PUT index template to it before fluentd will send data. There are some possible workarounds but every of them looks really ugly. So +1 for making built in support of index templates. @pitr
Since version 2.x elasticsearch supports index templates creation only via API

+1 for making built in support of index templates.

To implement this behaviour, this gem would need to do the following additional work before writing a record to elasticsearch:

  1. Check if the index exists
  2. If not, create the index and the mapping

Seems like that might impact the performance. What do you think @aleks-v-k @ssergiienko? What about deleting old indices?

Can you provide more information about how you run ElasticSearch in "containerized environment"? Is this something Cloudlinux is working on (I'm not familiar with their offerings)?

The gem would only need to write the index templates once at startup since a wildcard match could be used. As seen in the Logstash template, just adding something like logstash-* should take care of things.

Hmm, sounds like a good idea then

@pitr Thanks for reply and excuse me for late answer,

To implement this behaviour, this gem would need to do the following additional work before writing a record to elasticsearch

I suppose there is some sort of initialization point in fluent plugin system, so it may be done there: not for each exact index, but for a group of indexes as @stanhu had mentioned.

Can you provide more information about how you run ElasticSearch in "containerized environment"? Is this something Cloudlinux is working on (I'm not familiar with their offerings)?

It is a group of hosts, every of them runs docker daemon and has a pair of fluentd&elasticsearch containers to collect and access logs. It is not for CloudLinux OS, It is a part of Kuberdock project which uses kubernetes. We control configuration of elasticsearch and fluentd containers by our own docker images, but not manually start and stop them. They automatically run on every host added into a kuberdock cluster.

would love to see this implemented. Using the tag log-opt in docker with {{.ImageName}} results in tags with -'s and :'s. I need to set the tag field to not_analyzed so I can properly search for docker images, but having to do it through the ES API goes against my want to keep everything in source control.

Any chance of this happening? Would love to contribute, but don't have enough time to learn Ruby at the moment.

I've implemented this and it's working for me.

#194

implemented with #194 thanks @aerickson