Idle timeout for etcd should be at least 1 hour
r7vme opened this issue · comments
Many services "watch" etcd, so it's expected that connections to etcd will not be dropped after 60sec (current elb idle timeout).
I see two solutions:
- increase idle timeout to 1 hour
- switch from ELB to DNS cname which directly point to master node
This timeout issue probably is the root cause of Calico/Confd issue, when new nodes can not join Calico, becuase existing nodes missed events from etcd. https://github.com/giantswarm/giantswarm/issues/1687#issuecomment-328551514
That happens in customer guest clusters periodically.
- is not possible atm (we discussed this on sig-updates) so let's do start with 1, should be super simple hack
there's some PRs by Tim from IC consult that will make these timeouts configurable through the TPR:
https://github.com/giantswarm/awstpr/pull/45/files
Yes the change from Tim @ IC Consult is to use the same timeout for all 3 ELBs. If there is no need to have separate values then I think we can go with that.
Oh, I thought this is separate values, I would vote for separate values as I'm not so sure we want to just increase to maximum (60 min) for all ELBs.
The awstpr
change does have separate timeouts. Ignore me on this!
Idle timeout set to 3600 secs in #445