deis / router

Edge router for Deis Workflow

Home Page:https://deis.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimize the control loop

krancour opened this issue · comments

Currently, the router does all of the following every ten seconds:

  1. Find the router's own deployment object.
  2. Find all routable services.
  3. Find secrets that contain certificates

All of these are the inputs used to build / compute a model. This model is deep compared to the previously computed model, which is kept in memory. When there are differences, Nginx configuration is re-generated by using the new model as input to a template. Nginx is reloaded with the new configuration.

Note that the model is computed before any comparison is made so that inconsequential changes to k8s resources (those that wouldn't affect the router's configuration) won't trigger an unnecessary Nginx reload.

Here's where there's room to optimize-- I feel like continuously computing the model is wasteful since changes that affect the router's configuration are a relatively rare event. This not only wastes CPU cycles, but depending on how many routable services live in your cluster, this process can be very chatty with the apiserver. This puts an unnecessary load on the network.

A more mature approach to this may be to watch the k8s event stream instead. Only in the event that a k8s resource we're interested in (as determined by a label or well known name) has been added or modified should we care about retrieving those resources and re-computing the model. What's more, because we'd know what changed before re-computing the model, we could also re-compute and replace a portion of the existing model.

cc @arschles -- I feel like if there's any obvious problem with this approach, you'd be the guy to spot it. 😉

related issue that this may resolve: #212

@bacongobbler I'm not sure about it solving #212... the reason is that when the router first starts, it needs to compute the entire config model. It's after that that we can watch the event stream and just compute deltas.

Ah yeah I guess the base case wouldn't be resolved, but rather every subsequent case. Good call.

FWIW I think this is a great idea regardless if it fixes #212 or not, but I don't have enough experience with the k8s event API to know if this'll work in the way we intend. I imagine @arschles or yourself know better than me. Would be more than happy taking a crack at this though!

@arschles and I both felt this would work. He can speak to it better than I, but he did say something about possible problems with the k8s client disconnecting. There's reconnect logic in Steward that covers that base. Would have to do the same here.

Experience working on the Kubernetes Service Catalog's controller has taught me that what we need to implement here is the so-called "informer pattern," which is the common pattern implemented by Kubernetes controllers. (The router, essentially, is a Kubernetes controller.)

This issue was moved to teamhephy/router#17