giantswarm / aws-operator

Manages Kubernetes clusters running on AWS (before Cluster API)

Home Page:https://www.giantswarm.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Idle timeout for etcd should be at least 1 hour

r7vme opened this issue · comments

commented

Many services "watch" etcd, so it's expected that connections to etcd will not be dropped after 60sec (current elb idle timeout).

I see two solutions:

  1. increase idle timeout to 1 hour
  2. switch from ELB to DNS cname which directly point to master node

This timeout issue probably is the root cause of Calico/Confd issue, when new nodes can not join Calico, becuase existing nodes missed events from etcd. https://github.com/giantswarm/giantswarm/issues/1687#issuecomment-328551514

That happens in customer guest clusters periodically.

cc: @teemow @puja108 @rossf7

  1. is not possible atm (we discussed this on sig-updates) so let's do start with 1, should be super simple hack
commented

there's some PRs by Tim from IC consult that will make these timeouts configurable through the TPR:
https://github.com/giantswarm/awstpr/pull/45/files

Yes the change from Tim @ IC Consult is to use the same timeout for all 3 ELBs. If there is no need to have separate values then I think we can go with that.

commented

Oh, I thought this is separate values, I would vote for separate values as I'm not so sure we want to just increase to maximum (60 min) for all ELBs.

The awstpr change does have separate timeouts. Ignore me on this!

Idle timeout set to 3600 secs in #445