logstash-plugins / logstash-input-s3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

S3 plugin not functioning correctly for GZ files from Firehose

apatnaik14 opened this issue · comments

I was testing the s3 plugin for a production POC where a Firehose delivery system is delivering Cloudwatch logs into an S3 bucket from where I am reading it with the S3 plugin into logstash

My logstash config is as below:

input {
s3 {
bucket => "test"
region => "us-east-1"
role_arn => "test"
interval => 10
additional_settings => {
"force_path_style" => true
"follow_redirects" => false
}
}
}

output {
elasticsearch {
hosts => ["http://localhost:9200"]
sniffing => false
index => "s3-logs-%{+YYYY-MM-dd}"
}
stdout { codec => rubydebug }
}

As I start up logstash locally, I can see the data reaching to logstash but its not in proper format, like below.

{
"type" => "s3",
"message" => "\u001F�\b\u0000\u0000\u0000\u0000\u0000\u0000\u0000͒�n\u00131\u0010�_��\u0015�����x���MC)\u0005D\u0016!**************************************",
"@Version" => "1",
"@timestamp" => 2019-07-12T15:32:37.328Z
}

I also tried adding a codec => "gzip_lines" into the configuration, but then logstash was not able to process those files at all. The documentation suggests S3 plugin is supposed to support GZ files out of the box. I was hoping if anyone could point out what I am doing wrong?

Regards,
Arpan

Please find below version and OS information.

  • Version: Logstash 7.1.1 (Plugin logstash-input-s3-3.4.1)
  • Operating System: Ubuntu 17.04
  • Config File (if you have sensitive info, please remove it): Added above
  • Sample Data: N.A
  • Steps to Reproduce: Mentioned above.

Hi @yaauie,
I am having the same issue. Is there an update on this?
I tried to use several different decoders. Without any results.

Thanks a lot.

Hi @yaauie !

I was hoping to check on the plan to merge the above changes into the plugin?

Regards,
Arpan

@apatnaik14 I am in similar boat as you! Wondering if you had any luck with other workarounds you may have tried?

Hey @mrudrara ,
I created simple Lambda function which adds the extension to each file uploaded to the S3 bucket.
This lambda is invoked by the PUT Event rule of the S3 bucket.
I can share the function if you like to.

Thanks @Luk3rson! Really appreciate it. Wondering if you had issues with too many lambda invocations ever?

@Luk3rson can you share the function may be gist

thanks in advance

Hi @mrudrara
Apologize for the late reply,
Here is my function Luk3rson's GZIP Lambda convertor
Regards

Hi @Luk3rson Really appreciate it. Meanwhile while working AWS Support engineer they also recommended "Data Transformation with Lambda"

Hi @Luk3rson,@mrudrara

If the folder only contains gz logs then you can add this filter in the s3 plugin (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3.html#plugins-inputs-s3-gzip_pattern)

gzip_pattern >= ".*?$"

So that input plugin will treat the files as gz without appending a gz extension using the lambda