Multiple Broker/Indexers ingesting

Question

Multiple Broker/Indexers ingesting

cdenneen opened this issue 9 years ago · comments

I'd like to ask about the ability or maybe how would you have multiple logstash servers (for HA) possibly pulling from same s3 input but doesn't share sincedb_path.
Could you use a NFS/GFS filesystem and have more than one instance of logstash using the same sincedb file?
This might not even be possible but would be really helpful when s3 input threads might die (logstash still running) and now ingestion has stopped.
Obviously fixing the s3 input thread from dying is the correct fix but for HA if the LS node died it would be nice if you could have 2 running so could fix node without downstream data loss/backup/delay.

Joshua Spence commented 8 years ago

+1

Alan Gomes · Answer 1 · Wed Nov 11 2015 02:06:35 GMT+0800 (China Standard Time)

Any update on this?

webmstr · Answer 2 · Sat Aug 20 2016 05:56:03 GMT+0800 (China Standard Time)

Happy one-year anniversary.

Jordan Sissel · Answer 3 · Sat Aug 20 2016 06:40:57 GMT+0800 (China Standard Time)

Logstash currently has no mechanism for internode coordination (two logstash nodes coordinating work efforts), and my best guess is that we would need a coordination mechanism in order to achieve what is proposed in this issue. The one external system that this input knows about is S3, and as far as I can tell, S3 can't be used for coordination because it lacks atomic operations that could make coordination possible.

At this time, I don't' have a solution, so this issue will wait until someone can come up with a solution that other S3 input users find agreeable.

Chris Denneen · Answer 4 · Sat Aug 20 2016 22:31:53 GMT+0800 (China Standard Time)

If the sincedb path was added to a shared source would logstash honor that it would there be collisions?
I think if we can't do proper locking the only other option would be to use some other external source as sincedb to enable coordination mechanism. So whether for file or s3 inputs the external resource would handle managing which broker grabs which file.
Extending the sincedb logic would only suggestion I could think of to enable making inputs like this redundant across multiple brokers.
Seems like a large under taking if someone wants to suggest some plugable options (mongo, dynamo, db, etc)

John Puskar · Answer 5 · Thu Jan 10 2019 05:10:52 GMT+0800 (China Standard Time)

The kinesis plugin uses a dynamodb row that just assigns different data streams to different logstash instances.

https://github.com/logstash-plugins/logstash-input-kinesis

This isn't the most efficient method, but for deployments with logstash-as-cattle, this type of implementation at least moves the plugin from: "we can't use it", to, "okay this will work".

Chris Denneen · Answer 6 · Sat Mar 02 2019 02:07:39 GMT+0800 (China Standard Time)

@codekitchen @robbavey can anyone tackle adding the dynamodb alternative to sincedb to this input?

Nuno Fernandes · Answer 7 · Mon Jun 03 2019 02:10:28 GMT+0800 (China Standard Time)

Yeah.. that would be awesome to change the sincedb to an "outside" system so that we can treat s3-input as cattle.

Evan Vinciguerra · Answer 8 · Tue Sep 19 2023 04:06:50 GMT+0800 (China Standard Time)

we've thought about utilizing EFS to achieve this, but for the time being, we settled on just boosting our logstash workers/mem/cpu/batch.size. But we need more.

UPDATE:
we will be trying out the logstash-input-s3-sns-sqs plugin for (if we can get it to work with out system) this should allow us to horizontally without worrying about the same files being processed per new container