solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy

Home Page:https://docs.solo.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provide pprof enabled envoy image for glooe 1.16.2-alpine

DuncanDoyle opened this issue · comments

Gloo Edge Product

Enterprise

Gloo Edge Version

v1.16.2-alpine

Is your feature request related to a problem? Please describe.

Facing an issue with dynamic metadata performance, we will need a pprof enabled image to troubleshoot

Describe the solution you'd like

I'd like to get pprof enabled image together with instructions on how we can use it to gather cpu profiles during load tests.

Describe alternatives you've considered

No response

Additional Context

https://solo-io.zendesk.com/agent/tickets/3155

Child issues:

┆Issue is synchronized with this Asana task by Unito

referring to these instructions for how to build this

The steps to build an image suitable for collecting CPU profiles are:

  1. build envoy(-gloo-ee) with debug symbols and gperftools
    • debug symbols: pass the -c dbg flag to bazel
    • tcmalloc: pass --define tcmalloc=gperftools to bazel
    • note: we should kick this off this afternoon, so that we will have the binary ready by tomorrow morning
    • note: we must build from the version of envoy-gloo-ee used in the version of gloo EE that the client has installed. In this case, the client is using v1.16.2-alpine, which maps to envoy-gloo-ee v1.27.3-patch1
  2. build envoy(-gloo-ee) docker image
    • note that I'm referring to the docker images built from the envoy-gloo/envoy-gloo-ee repos, not the *-wrapper images built in our control plane repos
    • We need to update this Dockerfile to use alpine as a base image
    • We need to update this Dockerfile to set the CPUPROFILE environment variable to a location to dump CPU profiles to. I used CPUPROFILE=/tmp/mybin.cpuprof, so I think we should probably leave that as-is for this build
  3. build envoy(-gloo-ee) wrapper docker image
    • I am not sure about the state of the v1.16.2-alpine tag in solo-projects, but I imagine we should just be able to update the ENVOY_GLOO_IMAGE_VERSION variable in the Makefile and then do make gloo-ee-envoy-wrapper-docker to build the wrapper image
  4. validate that you can collect profiles from the image, and that the profiles contain meaningful data
    • make sure to collect the profiles by toggling the admin interface endpoint
    • make sure you can easily copy the profiles out of the docker container
    • make sure that when you view the profiles in pprof, there are references to lots of different function calls (and not just a single entry for envoy or something like that)
    • we should follow the steps here (please note -- specifically the steps in my LAST comment on the issue) to do this validation: https://github.com/solo-io/solo-projects/issues/5763#issuecomment-1957912003
      • I think a lot of the information here around persistent volumes etc. is not necessary -- before I figured out that we needed to use the admin interface, I was having a lot of trouble generating profiles and most of those instructions are left over from that process
    • feel free to reach out/pair with me on this step -- I think I have a pretty good grasp on it at this point
  5. publish wrapper docker image
    • When we delivered the last image, I just manually pushed the wrapper image to the gcr.io/solo-test-236622 registry, as gcr.io/solo-test-236622/gloo-ee-envoy-wrapper:1.15.7-gperftools-tcmalloc-alpine
    • We also can probably publish the image using the test release action in solo-projects

I linked this in the above comment, but it is worthwhile to be aware of the upstream envoy documentation around this functionality: https://github.com/envoyproxy/envoy/blob/main/bazel/PPROF.md

I have found that it's pretty incomplete, so take anything you read there as a half-truth, but it does have a lot of useful information

Another important point, related to step 1:

  • Our support for debug builds is broken in envoy-gloo/envoy-gloo-ee LTS version 1.26 and greater, as a consequence of the steps we took to do the v1.25 -> v1.26 bump
  • Here is the PR I used to build the last image for this purpose: https://github.com/solo-io/envoy-gloo-ee/pull/731/
    • the PR is old, so the files view is not particularly useful, but the commits will show the steps I took in order to create a working build of envoy with debug symbols
    • We should be able to apply similar steps to create this binary, although they will possibly be slightly different, as we are compiling a 1.27 binary instead of 1.26

@ben-taussig-solo On step #2 in this comment, do we just run make docker-local && make docker-release from the envoy-gloo-ee Makefile?

This is complete and was sent via slack.

More information can be found here