Watch request for CRs costs about 10-15x more memory in k8s-apiserver than in-tree resource watches
nagygergo opened this issue · comments
What happened?
I was running some load testing related to flux. When creating 10.000 kustomization custom resources (about 1KiB), the k8s apiserver consumes about 1GiB of memory. When checking with 100.000k and 300.00k, the k8s apiserver scales linearly.
When doing the same thing for 1KiB conifgmaps, creating 10.000 resources, the k8s apiserver consumes about 100 MiB of memory.
Memory pprof for 10k kustomizations:
Memory pprof for 10k configmaps:
The memory usage stays the same as long as the resources are existing. After looking a bit into what might force this, it seems that the kube-controller-manager sets up a watch for the kustomizations/configmaps resources.
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"a4187168-a301-4fd1-907a-09da8fc3b587","stage":"RequestReceived","requestURI":"/apis/kustomize.toolkit.fluxcd.io/v1/kustomizations?allowWatchBookmarks=true\u0026resourceVersion=727\u0026timeout=5m37s\u0026timeoutSeconds=337\u0026watch=true","verb":"watch","user":{"username":"system:kube-controller-manager","groups":["system:authenticated"]},"sourceIPs":["172.18.0.2"],"userAgent":"kube-controller-manager/v1.29.2 (linux/amd64) kubernetes/4b8e819/metadata-informers","objectRef":{"resource":"kustomizations","apiGroup":"kustomize.toolkit.fluxcd.io","apiVersion":"v1"},"requestReceivedTimestamp":"2024-04-29T14:33:32.979279Z","stageTimestamp":"2024-04-29T14:33:32.979279Z"}
This is needed because garbage collector that runs in kube-controller-manager needs to walk the ownership reference map, and it wants to do that in cache:
What did you expect to happen?
Expectation would've been that there is similar memory usage for in-tree and custom resources.
Also, the current garbage collector seems to force k8s-apiserver to cache the full contents of etcd. Is that a correct implementation?
How can we reproduce it (as minimally and precisely as possible)?
-
Create a cluster
kind create cluster
-
Add the kustomize CRD
curl -L https://raw.githubusercontent.com/fluxcd/kustomize-controller/main/config/crd/bases/kustomize.toolkit.fluxcd.io_kustomizations.yaml | kubectl apply -f -
-
Create 10k of the following
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: podinfo
spec:
interval: 10m
targetNamespace: default
sourceRef:
kind: GitRepository
name: podinfo
path: "./kustomize"
prune: true
timeout: 1m
patches:
- patch: |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: not-used
spec:
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
target:
kind: Deployment
labelSelector: "app.kubernetes.io/part-of=my-app"
- patch: |
- op: add
path: /spec/template/spec/securityContext
value:
runAsUser: 10000
fsGroup: 1337
- op: add
path: /spec/template/spec/containers/0/securityContext
value:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop:
- ALL
target:
kind: Deployment
name: podinfo
namespace: apps
Anything else we need to know?
Kubernetes version
$ kubectl version
# paste output here
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/sig apimachinery
@nagygergo: The label(s) sig/apimachinery
cannot be applied, because the repository doesn't have them.
In response to this:
/sig apimachinery
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/sig api-machinery