optiopay / kafka

Go driver for Kafka

Home Page:https://godoc.org/github.com/optiopay/kafka

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

optiopay/kafka to use exponential backoff on nodes that are down

keep94 opened this issue · comments

Background:

The KAFKA that I have to write to has 5 nodes, one of which is permanently down. I write to the nodes round robin style. Every 5th write fails for sure, but it takes a long time to fail because of all the retries. Therefore, this failed node greatly reduces my overall write throughput by several orders of magnitude.

I suggest that when optiopay/kafka encounters a connection refused or similar error on a node that it performs some kind of exponential backoff strategy rather than aggressively retrying it each time. If one of the 5 nodes is down for a very long time, optiopay/kafka should write only to the other 4 and try the down node only occasionally.

Thoughts?

I see your point, but I cannot suggest anything quickly.

What you're suggesting might be too hard to implement to make it worth having in current API. I think the real problem is not in driver's retry policy, but in how the distributed producers are implemeneted.

If you don't care where the message goes as long as it's fast, you could write custom distributed producer that use channel to multiplex all writes and distributes to any available node.

I will think about it, but if you have any more ideas, please share them.