graylog-labs / graylog-plugin-metrics-reporter

Graylog Metrics Reporter Plugins

Home Page:https://www.graylog.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ES Indexing errors with this plugin

TotalGriffLock opened this issue · comments

I'm using Graylog 4.11 with version 3.0.0 of the metrics-reporter-gelf plugin running to log metrics back into Graylog. I've done no plugin configuration short of

metrics_gelf_enabled = true

in server.conf.

Most metrics are being logged every 15 seconds as expected but there are obviously some that are being dumped as I have 100k of indexing failures. I've narrowed it down to this plugin by routing all messages from my gelf input into a separate index. The only thing I have generating gelf messages into that input is this plugin. The input only listens on localhost so it isn't outside interference.

Every 5 minutes I get these indexer failures:

Timestamp Index Letter ID Error message
a few seconds ago gelf_0 0786ab1e-f535-11eb-8a1b-00155d366e62 ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id '0786ab1e-f535-11eb-8a1b-00155d366e62'. Preview of field's value: 'Wed Aug 04 15:01:52 UTC 2021']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "Wed Aug 04 15:01:52 UTC 2021"]];
a few seconds ago gelf_0 0785c096-f535-11eb-8a1b-00155d366e62 ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id '0785c096-f535-11eb-8a1b-00155d366e62'. Preview of field's value: '[]']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "[]"]];
a few seconds ago gelf_0 fe953d41-f534-11eb-8a1b-00155d366e62 ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'fe953d41-f534-11eb-8a1b-00155d366e62'. Preview of field's value: '[]']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "[]"]];
a few seconds ago gelf_0 fe95b270-f534-11eb-8a1b-00155d366e62 ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'fe95b270-f534-11eb-8a1b-00155d366e62'. Preview of field's value: 'Wed Aug 04 15:01:52 UTC 2021']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "Wed Aug 04 15:01:52 UTC 2021"]];
a few seconds ago gelf_0 f5aafb66-f534-11eb-8a1b-00155d366e62 ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'f5aafb66-f534-11eb-8a1b-00155d366e62'. Preview of field's value: 'Wed Aug 04 15:01:52 UTC 2021']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "Wed Aug 04 15:01:52 UTC 2021"]];
a few seconds ago gelf_0 f5aa8648-f534-11eb-8a1b-00155d366e62 ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'f5aa8648-f534-11eb-8a1b-00155d366e62'. Preview of field's value: '[]']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "[]"]];
a minute ago gelf_0 ecb82e12-f534-11eb-8a1b-00155d366e62 ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'ecb82e12-f534-11eb-8a1b-00155d366e62'. Preview of field's value: '[]']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "[]"]];
a minute ago gelf_0 ecb87c4e-f534-11eb-8a1b-00155d366e62 ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'ecb87c4e-f534-11eb-8a1b-00155d366e62'. Preview of field's value: 'Wed Aug 04 15:01:52 UTC 2021']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "Wed Aug 04 15:01:52 UTC 2021"]];

My understanding is that GL will have calculated the field types for this input based on the message content and set that as the index's template in ES. Field refresh on this index is set to 5 seconds. I assume that something is being logged with the timestamp in a field which the ES indexer has determined should be a long, and again with something which is [] into a field defined as a long. So I think this could be resolved with a static ES template for this index?

Any suggestions as to how to resolve this gratefully received.

Here's the dynamic template generated for this index (and therefore this plugin's messages because nothing else logs to that input)

$ curl -X GET "localhost:9200/_template/gelf-template?pretty=true"

{
  "gelf-template" : {
    "order" : -1,
    "index_patterns" : [
      "gelf_*"
    ],
    "settings" : {
      "index" : {
        "analysis" : {
          "analyzer" : {
            "analyzer_keyword" : {
              "filter" : "lowercase",
              "tokenizer" : "keyword"
            }
          }
        }
      }
    },
    "mappings" : {
      "_source" : {
        "enabled" : true
      },
      "dynamic_templates" : [
        {
          "internal_fields" : {
            "mapping" : {
              "type" : "keyword"
            },
            "match_mapping_type" : "string",
            "match" : "gl2_*"
          }
        },
        {
          "store_generic" : {
            "mapping" : {
              "type" : "keyword"
            },
            "match_mapping_type" : "string"
          }
        }
      ],
      "properties" : {
        "gl2_processing_timestamp" : {
          "format" : "uuuu-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        },
        "gl2_accounted_message_size" : {
          "type" : "long"
        },
        "gl2_receive_timestamp" : {
          "format" : "uuuu-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        },
        "full_message" : {
          "fielddata" : false,
          "analyzer" : "standard",
          "type" : "text"
        },
        "streams" : {
          "type" : "keyword"
        },
        "source" : {
          "fielddata" : true,
          "analyzer" : "analyzer_keyword",
          "type" : "text"
        },
        "message" : {
          "fielddata" : false,
          "analyzer" : "standard",
          "type" : "text"
        },
        "timestamp" : {
          "format" : "uuuu-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        }
      }
    },
    "aliases" : { }
  }
}

The only field which is a [long] is gl2_accounted_message_size. So is this plugin causing that field to sometimes contain the timestamp or a null value?

I have resolved this myself, via https://community.graylog.org/t/graylog-metrics-plugin-feeding-data-via-gelf-to-graylog-causing-parsing-errors/16356/3

Most of the values for metrics are numbers so Graylog/ES correctly decide to store the "value" field as a [long]. However there are 2 metrics (at the time of writing):
org.graylog2.journal.oldest-segment
jvm.threads.deadlocks
where the value is either a string (timestamp) or a collection/array. Obviously this data will not go in a field with the type of long. The graylog community URL above provides a solution but only for 1 specific metric. I've put the GELF metrics input through a pipeline with the following rule, which has resolved the errors for me and should work as new metrics are added which are not numeric:

Rule "Cleanup: Non-numeric metrics value field"
when
has_field("value") AND
not is_long("value")
then
rename_field(
old_field: "value",
new_field: "value_string"
);
end

That didn't appear to be working either, but this does. Can't spend any more time on it right now, but if anyone else is having the same problem this will fix it.

Rule "Cleanup: Non-numeric metrics value field"
when
has_field("name") AND
has_field("value") AND
(to_string($message.name) == "org.graylog2.journal.oldest-segment" OR
to_string($message.name) == "jvm.threads.deadlocks")
then
let value_string = to_string($message.value);
set_field ("value_string",value_string);
remove_field("value");
end