cilium / tetragon

eBPF-based Security Observability and Runtime Enforcement

Home Page:https://tetragon.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tetragon with ociHookSetup.enabled cannot start in a namespace other than kube-system

f1ko opened this issue · comments

commented

What happened?

Description

When using the OCI hook feature, an exception for all Pods in the kube-system namespace are added so that Tetragon itself can start as the hook has a dependency on Tetragon.
However, this results in a deadlock scenario if Tetragon is being deployed in any other namespace.


Reproduction

Install Tetragon with the oci-hook feature enabled inside another namespace (i.e. not in kube-system):

kubectl create ns tetragon
helm install --namespace tetragon \
        --set tetragonOperator.image.override=localhost/cilium/tetragon-operator:latest \
        --set tetragon.image.override=localhost/cilium/tetragon:latest  \
        --set tetragon.grpc.address="unix:///var/run/cilium/tetragon/tetragon.sock" \
        --set tetragon.ociHookSetup.enabled=true \
        tetragon ./install/kubernetes/tetragon

The init container starts as expected and configures the oci-hook.
However, this leads to the agent never being able to start as the oci-hook cannot reach the agent and the only exception being Pods in the kube-system namespace:

$ kubectl describe pod -n tetragon tetragon-tctms
[...]
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  54s                default-scheduler  Successfully assigned tetragon/tetragon-tctms to minikube
  Normal   Pulling    53s                kubelet            Pulling image "localhost/cilium/tetragon:latest"
  Normal   Pulled     53s                kubelet            Successfully pulled image "localhost/cilium/tetragon:latest" in 11ms (11ms including waiting). Image size: 215976744 bytes.
  Normal   Created    53s                kubelet            Created container oci-hook-setup
  Normal   Started    53s                kubelet            Started container oci-hook-setup
  Warning  Failed     43s                kubelet            Error: container create failed: time="2024-04-29T11:36:18Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "
  Warning  Failed     33s                kubelet            Error: container create failed: time="2024-04-29T11:36:28Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "
  Normal   Pulled     21s (x2 over 53s)  kubelet            Container image "quay.io/cilium/hubble-export-stdout:v1.0.4" already present on machine
  Normal   Pulled     11s (x2 over 43s)  kubelet            Container image "localhost/cilium/tetragon:latest" already present on machine
  Warning  Failed     11s                kubelet            Error: container create failed: time="2024-04-29T11:36:50Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "
  Warning  Failed     1s                 kubelet            Error: container create failed: time="2024-04-29T11:37:00Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: "

Consequence

This leaves the cluster in a bad state as no Pods (other than those in kube-system) are being created, including Tetragon itself.
The only way to restore cluster functionality at this point is by removing the oci-hook configuration on the node.

Tetragon Version

$ tetra version
CLI version: v1.1.0-pre.0-794-gbdcb413f0

Kernel Version

$ uname -a
Linux minikube 6.4.16 #1 SMP Mon Sep 18 21:45:38 UTC 2023 aarch64 Linux

Kubernetes Version

No response

Bugtool

No response

Relevant log output

No response

Anything else?

No response