logstash-plugins / logstash-input-s3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

S3 input can take a long time to start and a long time to stop

ph opened this issue · comments

When you have a bucket with a really large quantity of files it can take a while to start because of all the api calls the code has to do. #25 optimize the numbers of call to a reasonable about by using v2 of the API, but this still problematic.

The plugin can also take a really long time to stop, the current architecture of the plugin is single threaded. This mean the following: the listing of remote files, the downloading, the uncompressing and the actual processing is done in a single thread.

The stop doesn't correctly interrupt this chain.

We need to decouple theses part in different stages to better control the flow of execution of this plugin.

I am in the process of merging the logic of #25 and decoupling the code a bit to have better control of the execution. The problem with the v2 api is the way stuff are mocked have changed a lot since v1.

So I take the time to cleaning things up to see if I can improve performance and my confidence in the changes.

Also large files is killing the performance of this plugin.

Any update on this, we keep all of our CloudTrail logs in an S3 bucket and as the months have gone by the plugin has become more and more difficult to rely on since its checking months of CloudTrail log files now.