S3 input can take a long time to start and a long time to stop

Question

S3 input can take a long time to start and a long time to stop

ph opened this issue 8 years ago · comments

Pier-Hugues Pellerin commented 8 years ago

When you have a bucket with a really large quantity of files it can take a while to start because of all the api calls the code has to do. #25 optimize the numbers of call to a reasonable about by using v2 of the API, but this still problematic.

The plugin can also take a really long time to stop, the current architecture of the plugin is single threaded. This mean the following: the listing of remote files, the downloading, the uncompressing and the actual processing is done in a single thread.

The stop doesn't correctly interrupt this chain.

We need to decouple theses part in different stages to better control the flow of execution of this plugin.

Pier-Hugues Pellerin · Answer 1 · Wed Apr 06 2016 22:25:49 GMT+0800 (China Standard Time)

I am in the process of merging the logic of #25 and decoupling the code a bit to have better control of the execution. The problem with the v2 api is the way stuff are mocked have changed a lot since v1.

So I take the time to cleaning things up to see if I can improve performance and my confidence in the changes.

Pier-Hugues Pellerin · Answer 2 · Wed Apr 06 2016 22:31:17 GMT+0800 (China Standard Time)

Also large files is killing the performance of this plugin.

Kyle Gochenour · Answer 3 · Wed Jul 12 2017 05:35:34 GMT+0800 (China Standard Time)

Any update on this, we keep all of our CloudTrail logs in an S3 bucket and as the months have gone by the plugin has become more and more difficult to rely on since its checking months of CloudTrail log files now.