logstash-plugins / logstash-input-s3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add single run option (interval:0)

pranspach opened this issue · comments

I'd like to be able to stream files from s3 in a single batch, instead of a persistent watch interval.

For context, I am launching an S3->ES logstash docker container from an Airflow DAG.

Adding an interval:0 case feels intuitive and satisfies my needs. Would a PR for this be appreciated?

  public
  def run(queue)
    @current_thread = Thread.current
    if @interval == 0
      process_files(queue)
    else 
      Stud.interval(@interval) do
        process_files(queue)
      end
    end
  end # def run

@pranspach I would definitely appreciate a PR that adds one-shot processing that closes down the input when it completes.

There will likely be a bit of extra complexity in handling interrupts and stop sequence, since the plugin currently uses Stud#stop! to interrupt the Stud::Interval. It may be simple enough to wrap the one-off execution in a Stud::Task that we then Stud::Task#wait on to get the stopping semantics without changing anything else.

Or, we can simply interrupt the Stud::Interval after the first execution by sending Stud#stop!


Using a zero-interval may be overloading the parameter a bit -- my natural assumption when seeing an interval of 0 would be that the input would simply look for more as soon as it is done processing what was present last time. A negative interval would be a slightly clearer way of indicating "this is not normal; read the docs".

Alternatively, we could use a separate parameter, like watch_for_new_files => "false"?

Thanks @yaauie . I liked the watch_for_new_files parameter suggestion. Let me know if there are any additional changes/feedback/discussion. I'd love to have this functionality in the plugin.

@pranspach merged in #162 and released in v3.4.0 😄