logstash-plugins / logstash-input-s3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add filename field in logstash-input-s3

phirov opened this issue · comments

The object's name or filename in a S3 bucket is an import information which can mark the record data where it came from (especially, when there are huge files in a bucket).

There are many requirements on the stackoverflow, e.g.

Mount S3 to a local file system and use logstash-input-file is another solution to get the filename infomation. But in some circumstance, this will become more complicated when it is used in docker.

Since there is path field in logstash-input-file, why cannot include this to logstash-input-s3.

We need this feature too. We need to access the filenames that are synchronized from S3 to logstash for post-processing. Thank you for your great work!

will be in 3.1.3 of the plugin.

Thanks @ph

hi @ph i am currently using 3.1.4 but i don't see that as part of the input. not sure am i missing anything here

@jk2l it may be that the name is non-obvious. Do you have [@metadata][s3][key]?

@todd534 nope

here is the output

{
    "@timestamp" => 2017-05-27T12:19:10.033Z,
      "@version" => "1",
       "message" => "21-May-2017 08:55:19 INFO (6): Uploading Log...\n"
}

And here is my config

input {
    s3 {
        region => 'us-east-1'
        bucket => 'mybucket'
        prefix => 'prefix/path/'
    }
    stdin { }
}
output {
  stdout { codec => rubydebug }
}

I am using docker logstash:5 and here is my build

FROM logstash:5
MAINTAINER Jacky Leung <jacky@fishpond.co.nz>

RUN logstash-plugin update logstash-input-s3
RUN chown -R logstash: /usr/share/logstash/vendor/bundle/jruby/1.9/gems/

which does give me this on build

Step 3/5 : RUN logstash-plugin update logstash-input-s3
 ---> Running in 5b3de24b0d99
Updating logstash-input-s3
Updated logstash-input-s3 3.1.2 to 3.1.4

Also confirmed when login into container

root@06351116c7b5:/# logstash-plugin list --verbose logstash-input-s3
logstash-input-s3 (3.1.4)

btw credential is using AWS_SESSION_TOKEN

docker run --rm -it \
    -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
    -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
    -e AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN \
    -v $(pwd)/test/conf.d/s3input.conf:/etc/logstash/conf.d/s3input.conf  \
    <docker image ID> -f /etc/logstash/conf.d/ -i

i am testing it atm, not a permanent build

okay nevermind... I just realise i need to add metadata => true into rubydebug

maybe the document can use some update. it took me a while to figure it out

Hello, I have the similar problem. May be someone could tell me what I'm doing wrong?

input {
  s3 {
    access_key_id => "XXXXXX"
    secret_access_key => "XXXXXXX"
    region => "eu-west-1"
    prefix => "logs/XXXXX/"
    bucket => "xbucket"
  }
}
filter {
  grok {
    match => ["message", "(?<SmartId:>[a-zA-Z0-9]+) (?<Module:>[a-zA-Z0-9._-]+) %{TIMESTAMP_ISO8601:Datetime} %{LOGLEVEL:Severity} (?<Submodule:>[a-zA-Z0-9._-]+) %{GREEDYDATA:Logmessage}" ]
    add_field => {"receive_date" => "%{@timestamp}"}
    remove_field => "message"
  }
  if "_grokparsefailure" in [tags] {
    grok {
    }
  }
  date {
    match => ["Datetime", "YYYY-MM-dd HH:mm:ss.SSS"]
    target => "@timestamp"
    remove_field => "Datetime"
  }
}
output {
          elasticsearch {
                   hosts => 'localhost:9200'
                   manage_template => false
                   index => 'logstash-%{+YYYY.MM.dd}'
                   document_type => '%{[type]}'
         }
}

And got the same result :

{
    "@timestamp" => 2017-08-7T12:19:10.033Z,
      "@version" => "1",
       "message" => "message text"
}

Is it any way to get more information? (File name, File path)

@Sadovnikov94
I don't extract timestamp so I can't comment on that. I'll paste my working code here with which I retrieve the file key and use part of it as the document_id. An example of the S3 file key is tcr/39_10263.txt. I extract 10263 as the unique document ID. You can compare with yours and hope it can help you find out something.

input {
  s3 {
    access_key_id => "{{ .Env.S3_KEY }}"
    secret_access_key => "{{ .Env.S3_SECRETE }}"
    bucket => "{{ .Env.S3_BUCKET }}"
    prefix => "{{ .Env.S3_PREFIX }}"
    interval => 7200
    region => "us-east-1"
  }
}

filter {
  mutate {
    add_field => {
      "file" => "%{[@metadata][s3][key]}"
    }
    grok {
      match => { "file" => "_%{NUMBER:id}.txt" }
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch"]
    index => "file"
    document_id => "%{id}"
    codec => rubydebug {
      metadata => true
    }
  }
}

Note: my Logstash version is 5.4.0. It appears the S3 input plugin that is bundled in Logstash is not the latest version. In my Dockerfile I have to manually update the S3 input plugin with RUN logstash-plugin update logstash-input-s3

@eye8 Thanks, It helps!

@eye8 Thanks, it helped me. Just wanted to ask, is there any way to get source bucket name and backup_bucket name in filter section?