S3 plugin not functioning correctly for GZ files from Firehose
apatnaik14 opened this issue · comments
I was testing the s3 plugin for a production POC where a Firehose delivery system is delivering Cloudwatch logs into an S3 bucket from where I am reading it with the S3 plugin into logstash
My logstash config is as below:
input {
s3 {
bucket => "test"
region => "us-east-1"
role_arn => "test"
interval => 10
additional_settings => {
"force_path_style" => true
"follow_redirects" => false
}
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
sniffing => false
index => "s3-logs-%{+YYYY-MM-dd}"
}
stdout { codec => rubydebug }
}
As I start up logstash locally, I can see the data reaching to logstash but its not in proper format, like below.
{
"type" => "s3",
"message" => "\u001F�\b\u0000\u0000\u0000\u0000\u0000\u0000\u0000͒�n\u00131\u0010�_��\u0015�����x���MC)\u0005D\u0016!**************************************",
"@Version" => "1",
"@timestamp" => 2019-07-12T15:32:37.328Z
}
I also tried adding a codec => "gzip_lines" into the configuration, but then logstash was not able to process those files at all. The documentation suggests S3 plugin is supposed to support GZ files out of the box. I was hoping if anyone could point out what I am doing wrong?
Regards,
Arpan
Please find below version and OS information.
- Version: Logstash 7.1.1 (Plugin logstash-input-s3-3.4.1)
- Operating System: Ubuntu 17.04
- Config File (if you have sensitive info, please remove it): Added above
- Sample Data: N.A
- Steps to Reproduce: Mentioned above.
Hi @yaauie,
I am having the same issue. Is there an update on this?
I tried to use several different decoders. Without any results.
Thanks a lot.
Hi @yaauie !
I was hoping to check on the plan to merge the above changes into the plugin?
Regards,
Arpan
@apatnaik14 I am in similar boat as you! Wondering if you had any luck with other workarounds you may have tried?
Hey @mrudrara ,
I created simple Lambda function which adds the extension to each file uploaded to the S3 bucket.
This lambda is invoked by the PUT Event rule of the S3 bucket.
I can share the function if you like to.
Thanks @Luk3rson! Really appreciate it. Wondering if you had issues with too many lambda invocations ever?
Hi @mrudrara
Apologize for the late reply,
Here is my function Luk3rson's GZIP Lambda convertor
Regards
Hi @Luk3rson Really appreciate it. Meanwhile while working AWS Support engineer they also recommended "Data Transformation with Lambda"
If the folder only contains gz logs then you can add this filter in the s3 plugin (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3.html#plugins-inputs-s3-gzip_pattern)
gzip_pattern >= ".*?$"
So that input plugin will treat the files as gz without appending a gz extension using the lambda