Resource consuption are not limited or OOM killed
cedricmoulard opened this issue · comments
Description
I want to sandbox a pod with gvisor and limit resources consuption (cpu and memory).
I am using containerd as container runtime.
I notice that pod consumes more memory and cpu than it should. I tried many configs but it seems that gVisor is not able to manage that yet.
Steps to reproduce
Configuration
Runsc
File /etc/containerd/runsc.toml
log_path = "/var/log/runsc/%ID%/shim.log"
log_level = "debug"
[runsc_config]
debug = "true"
debug-log = "/var/log/runsc/%ID%/gvisor.%COMMAND%.log.json"
debug-log-format = "json"
Containerd
~# ctr --version
ctr github.com/containerd/containerd v1.7.13
File /etc/containerd/config.toml
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
[plugins."io.containerd.grpc.v1.cri".containerd]
no_pivot = false
default_runtime_name = "runc"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
runtime_type = "io.containerd.runsc.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc.options]
TypeUrl = "io.containerd.runsc.v1.options"
BinaryName = "/usr/local/bin/runsc"
ConfigPath = "/etc/containerd/runsc.toml"
Execute
Kubernetes resources
I am using stress-ng
to request 2048Mi and 9vCpu.
I am setting container resources limit to 1024Mi and 1 vCpu
---
apiVersion: v1
kind: Namespace
metadata:
name: test-gvisor
---
apiVersion: v1
kind: Pod
metadata:
name: memory-test-sandboxed
namespace: test-gvisor
spec:
runtimeClassName: gvisor
containers:
- args:
- -c 2
- -t 600s
- -m 8
- -M
image: polinux/stress-ng
name: memory-test-sandboxed
resources:
limits:
cpu: "1"
memory: 1024Mi
requests:
cpu: "1"
memory: 1024Mi
Get pod and containers ID/UID
export POD_ID=$(crictl pods --name memory-test-sandboxed -v -o json | jq -r ".items[0].id")
export POD_UID=$(crictl pods --name memory-test-sandboxed -v -o json | jq -r ".items[0].metadata.uid")
export POD_UID_UNDERSCORED=$(echo "$POD_UID" | tr '-' '_')
echo "POD_ID: ${POD_ID}"
echo "POD_UID: ${POD_UID}"
echo "POD_UID_UNDERSCORED: ${POD_UID_UNDERSCORED}"
export CONTAINER_ID=$(crictl ps -v -o json --pod $POD_ID | jq -r ".containers[0].id")
echo "CONTAINER_ID: ${CONTAINER_ID}"
Inspect Pod and Container
crictl inspect $CONTAINER_ID > /var/log/runsc/${CONTAINER_ID}/config.json
crictl inspectp $POD_ID > /var/log/runsc/${POD_ID}/config.json
crictl stats $CONTAINER_ID
List Logs
ls -ll /var/log/runsc/${POD_ID}
ls -ll /var/log/runsc/${CONTAINER_ID}
Get cgroup informations
ls -ll /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${POD_ID}
ls -ll /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${CONTAINER_ID}
Check memory
CGROUP_EXPORT_FILE=/var/log/runsc/${POD_ID}/cgroup.txt
touch $CGROUP_EXPORT_FILE
echo "================================= SYSTEMD CGROUP ${POD_ID}\n" >> $CGROUP_EXPORT_FILE
echo "================================= K8s POD CRI CONTAINER ${POD_ID}" >> $CGROUP_EXPORT_FILE
echo "memory.max:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${POD_ID}/memory.max >> $CGROUP_EXPORT_FILE
echo "memory.current:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${POD_ID}/memory.current >> $CGROUP_EXPORT_FILE
echo "================================= k8s CONTAINER CRI CONTAINER ${CONTAINER_ID}" >> $CGROUP_EXPORT_FILE
echo "memory.max:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${CONTAINER_ID}/memory.max >> $CGROUP_EXPORT_FILE
echo "memory.current:" >> /var/log/runsc/${CONTAINER_ID}/cgroup.txt
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${CONTAINER_ID}/memory.current >> $CGROUP_EXPORT_FILE
echo "================================= KUBEPODS CGROUP ${POD_ID}" >> $CGROUP_EXPORT_FILE
echo "memory.max:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/kubepods.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice/memory.max >> $CGROUP_EXPORT_FILE
echo "memory.current:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/kubepods.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice/memory.current >> $CGROUP_EXPORT_FILE
echo "================================= STATS CONTAINER ${CONTAINER_ID}" >> $CGROUP_EXPORT_FILE
echo "memory usage in bytes:" >> $CGROUP_EXPORT_FILE
crictl stats -o json $CONTAINER_ID | jq -r ".stats[0].memory.usageBytes.value" >> $CGROUP_EXPORT_FILE
Results
All logs are available here: https://github.com/cedricmoulard/gvisor-ressources-issue
Pod on cluster
I expect pod to be OOM killed or to use less than 1Gi and 1vCpu
kubectl top po
NAME CPU(cores) MEMORY(bytes)
memory-test-sandboxed 9074m 2093Mi
Cgroups
cat $CGROUP_EXPORT_FILE
================================= SYSTEMD CGROUP 02521dbbb0016b638eccb79d4362ff927dca72a9ebb4f6830781e82fcbc920af\n
================================= K8s POD CRI CONTAINER 02521dbbb0016b638eccb79d4362ff927dca72a9ebb4f6830781e82fcbc920af
memory.max:
max
memory.current:
2117292032
================================= k8s CONTAINER CRI CONTAINER 1d5c25b3695fc85879172bfac423f4417e0fdaeb29e4a27cb99c6db2712eed99
memory.max:
max
0
================================= KUBEPODS CGROUP 02521dbbb0016b638eccb79d4362ff927dca72a9ebb4f6830781e82fcbc920af
memory.max:
1073741824
memory.current:
0
================================= STATS CONTAINER 1d5c25b3695fc85879172bfac423f4417e0fdaeb29e4a27cb99c6db2712eed99
memory usage in bytes:
2208296960
runsc version
runsc version release-20240422.0
spec: 1.1.0-rc.1
docker version (if using docker)
No response
uname
Linux k8s-test-gvisor-kosmos-node01 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:50:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
kubectl (if using Kubernetes)
No response
repo state (if built from source)
No response
runsc debug logs (if available)
All logs are available here: https://github.com/cedricmoulard/gvisor-ressources-issue
From the logs you shared it looks like you/containerd are specifying a systemd cgroup path (format slice:cri-containerd:uid
) but not specifying the systemd-cgroup=true
in runsc_config
. Can you try adding that flag and seeing if you get the same behavior?
From the logs you shared it looks like you/containerd are specifying a systemd cgroup path (format
slice:cri-containerd:uid
) but not specifying thesystemd-cgroup=true
inrunsc_config
. Can you try adding that flag and seeing if you get the same behavior?
Yes, it's working, thank you
@manninglucas Can we autodetect whether or not systemd-based cgroup control should be enabled?
@EtiennePerot Maybe, but I think we should always try to stay in line with what runc does. Runc doesn't attempt to auto-detect systemd based configuration, it just reads whatever the user sets for the --systemd-cgroup
flag (default: false) [1] same as runsc. I can add a short README to the runsc systemd folder clarifying how this works to help future users avoid this confusion.
[1] https://github.com/opencontainers/runc/blob/e8bec1ba40039a004d57ddc0a9afec9a8364172b/docs/systemd.md
Fair enough, but perhaps also a warning log message in the runsc logs if this is detected?