envoyproxy / xds-relay

Caching, aggregation, and relaying for xDS compliant clients and origin servers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Perf issue during fanout

jyotimahapatra opened this issue · comments

We use a map for storing the requests in cache https://github.com/envoyproxy/xds-relay/blob/master/internal/app/cache/cache.go#L52 which is key'ed on the Discovery Request. As a result, each entry in the map is going to be a unique entry and addition of deletion of unique entries is going to cause a memory overload on the map. It is a known issue in golang maps. (here, here)

In order to prove the hypothesis i replicated the benchmark tests to insert increasing number of DiscoveryRequests and remove them. This simulates the fanout scenario (here). We can see that even if the eventual state in the cache is 1 entry, addition and deletion of increasing amount of map entries causes high degree of processing time.

Benchmarking code: #196

➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op 
BenchmarkCacheRetrieval-8   	  721880	      1509 ns/op	     944 B/op	      12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8   	  809854	      1473 ns/op	     944 B/op	      12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8   	  658707	      1641 ns/op	     944 B/op	      12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8   	  264152	      4144 ns/op	     944 B/op	      12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8   	   50784	     24675 ns/op	     944 B/op	      12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8   	    5220	    222593 ns/op	     944 B/op	      12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1000000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8   	     255	   4825196 ns/op	     944 B/op	      12 allocs/op

In a separate benchmark test #198 from orchestrator perspective, we got similar results.

➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8   	   69771	     16503 ns/op	    9408 B/op	      93 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8   	   64796	     16518 ns/op	    9408 B/op	      93 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8   	   68280	     18062 ns/op	    9408 B/op	      93 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8   	   50516	     23984 ns/op	    9408 B/op	      93 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8   	   28072	     41137 ns/op	    9408 B/op	      93 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8   	    4819	    236752 ns/op	    9426 B/op	      93 allocs/op

I updated gcp and removed usage of maps from cache and downstream. #204
Benchmark looks like this:

➜  xds-relay git:(benchnomap) export MAX_DISCOVERY_REQUESTS=1 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns 
   99036	     11910 ns/op
➜  xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=10 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns 
   92912	     12383 ns/op
➜  xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=100 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns 
   94479	     12682 ns/op
➜  xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=1000 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
  106798	     12866 ns/op
➜  xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=10000 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
   94300	     12878 ns/op
➜  xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=100000 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
   12646	     86420 ns/op

This is an improvement from the current implementation.