Provide pprof enabled envoy image for glooe 1.16.2-alpine
DuncanDoyle opened this issue · comments
Gloo Edge Product
Enterprise
Gloo Edge Version
v1.16.2-alpine
Is your feature request related to a problem? Please describe.
Facing an issue with dynamic metadata performance, we will need a pprof enabled image to troubleshoot
Describe the solution you'd like
I'd like to get pprof enabled image together with instructions on how we can use it to gather cpu profiles during load tests.
Describe alternatives you've considered
No response
Additional Context
https://solo-io.zendesk.com/agent/tickets/3155
Child issues:
┆Issue is synchronized with this Asana task by Unito
referring to these instructions for how to build this
The steps to build an image suitable for collecting CPU profiles are:
- build envoy(-gloo-ee) with debug symbols and gperftools
- debug symbols: pass the
-c dbg
flag to bazel - tcmalloc: pass
--define tcmalloc=gperftools
to bazel - note: we should kick this off this afternoon, so that we will have the binary ready by tomorrow morning
- note: we must build from the version of envoy-gloo-ee used in the version of gloo EE that the client has installed. In this case, the client is using v1.16.2-alpine, which maps to envoy-gloo-ee v1.27.3-patch1
- debug symbols: pass the
- build envoy(-gloo-ee) docker image
- note that I'm referring to the docker images built from the envoy-gloo/envoy-gloo-ee repos, not the *-wrapper images built in our control plane repos
- for example, the enterprise Dockerfile can be found here (envoy-gloo-ee/ci/Dockerfile)
- We need to update this Dockerfile to use alpine as a base image
- at the time of writing, the most recent commit targeting this Dockerfile updated it to use ubuntu as a base image, so you can just revert to the prior commit to get a valid alpine-based Dockerfile
- We need to update this Dockerfile to set the
CPUPROFILE
environment variable to a location to dump CPU profiles to. I usedCPUPROFILE=/tmp/mybin.cpuprof
, so I think we should probably leave that as-is for this build- more info here, although YMMV with this doc: https://github.com/envoyproxy/envoy/blob/main/bazel/PPROF.md
- we can probably also set this in the wrapper image, but doing it here is useful in case we want to test this "base image"
- note that I'm referring to the docker images built from the envoy-gloo/envoy-gloo-ee repos, not the *-wrapper images built in our control plane repos
- build envoy(-gloo-ee) wrapper docker image
- I am not sure about the state of the
v1.16.2-alpine
tag in solo-projects, but I imagine we should just be able to update theENVOY_GLOO_IMAGE_VERSION
variable in the Makefile and then domake gloo-ee-envoy-wrapper-docker
to build the wrapper image
- I am not sure about the state of the
- validate that you can collect profiles from the image, and that the profiles contain meaningful data
- make sure to collect the profiles by toggling the admin interface endpoint
- make sure you can easily copy the profiles out of the docker container
- make sure that when you view the profiles in pprof, there are references to lots of different function calls (and not just a single entry for envoy or something like that)
- we should follow the steps here (please note -- specifically the steps in my LAST comment on the issue) to do this validation: https://github.com/solo-io/solo-projects/issues/5763#issuecomment-1957912003
- I think a lot of the information here around persistent volumes etc. is not necessary -- before I figured out that we needed to use the admin interface, I was having a lot of trouble generating profiles and most of those instructions are left over from that process
- feel free to reach out/pair with me on this step -- I think I have a pretty good grasp on it at this point
- publish wrapper docker image
- When we delivered the last image, I just manually pushed the wrapper image to the
gcr.io/solo-test-236622
registry, asgcr.io/solo-test-236622/gloo-ee-envoy-wrapper:1.15.7-gperftools-tcmalloc-alpine
- We also can probably publish the image using the test release action in solo-projects
- When we delivered the last image, I just manually pushed the wrapper image to the
I linked this in the above comment, but it is worthwhile to be aware of the upstream envoy documentation around this functionality: https://github.com/envoyproxy/envoy/blob/main/bazel/PPROF.md
I have found that it's pretty incomplete, so take anything you read there as a half-truth, but it does have a lot of useful information
Another important point, related to step 1:
- Our support for debug builds is broken in envoy-gloo/envoy-gloo-ee LTS version 1.26 and greater, as a consequence of the steps we took to do the v1.25 -> v1.26 bump
- See issue here: https://github.com/solo-io/envoy-gloo-ee/issues/718
- Here is the PR I used to build the last image for this purpose: https://github.com/solo-io/envoy-gloo-ee/pull/731/
- the PR is old, so the files view is not particularly useful, but the commits will show the steps I took in order to create a working build of envoy with debug symbols
- We should be able to apply similar steps to create this binary, although they will possibly be slightly different, as we are compiling a 1.27 binary instead of 1.26
@ben-taussig-solo On step #2 in this comment, do we just run make docker-local && make docker-release
from the envoy-gloo-ee Makefile?
This is complete and was sent via slack.
More information can be found here