envoyproxy / java-control-plane

Java implementation of an Envoy gRPC control plane

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support hashing on resources field in addition to Node in NodeGroup

mpuncel opened this issue · comments

I've been thinking about how to improve the performance of our control plane, and I've realized that with EDS in particular, the control plane does a lot of work generating ClusterLoadAssignments that could theoretically be shared for many different Nodes.

If 3 different microservices depend on the "foo" cluster for example, we can likely generate just one ClusterLoadAssignment for "foo" one time and share it for all.

Right now we're using ADS for everything, I'm thinking about using ADS just for Listeners, Routes, and Clusters, and having a separate EDS only control plane that can create a NodeGroup based on the resource names (from DiscoveryRequest) for the ClusterLoadAssignment in addition to the Node.

What do y'all think about me adding that capability?

Stated another way, instead of only providing

public interface NodeGroup<T> {
  T hash(Node var1);
}

We could also allow implementing

public interface NodeAndResourceNamesGroup<T> {
  T hash(Node var1, Set<String> resourceNames);
}

Just a heads up - if you drop ADS for EDS, Envoy will initiate stream per EDS request. Assuming you've got 1000 clusters in Envoy, you will see 1000 streams (one for each cluster). Would this be ok in your case?

I think you will be better off staying in ADS and share the generated ClusterLoadAssignment between snapshots.

@mpuncel any updates on this? Have you tried this approach? Any insights that you might share?

Here is another approach that go control plane adopted
https://github.com/envoyproxy/go-control-plane/blob/master/pkg/cache/v2/linear.go

They introduced LinearCache which lets you version individual resources. Then, there is MuxCache https://github.com/envoyproxy/go-control-plane/blob/master/pkg/cache/v2/mux.go which can help you to pick the proper cache for DiscoveryRequest.

Translating this to ADS. Everything is served with ADS. Then, you use LinearCache to serve ClusterLoadAssignments for the whole cluster and every other to the regular SnapshotCache. Then you Mux EDS requests to LinearCache and the rest to SnapshotCache.
This not only solves the problem of sharing the same ClusterLoadAssignments for multiple proxies, but also enables you to version individual ClusterLoadAssignments, so you won't trigger all CLA changes when one endpoint in 1 cluster changes