cilium / tetragon

What happened?

In our K8s environment, we deployed the same policy to two different namespaces. However, only the first policy gets applied. This was confirmed by running the tetra tp list command on the tetra pods. We tested this behavior with the fd-install TracingPolicyNamespaced config in two different namespaces (default and test):

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "fd-install"
spec:
  kprobes:
  - call: "fd_install"
    syscall: false
    args:
    - index: 0
      type: "int"
    - index: 1
      type: "file"
    selectors:
    - matchArgs:
      - index: 1
        operator: "Equal"
        values:
        - "/tmp/tetragon"
      matchActions:
      - action: Sigkill

The following is the output of tetra tp list:

[5] fd-install enabled:true filterID:5 namespace:default sensors:gkp-sensor-5

Only one policy is applied and the policy for test is not even though the k8s resource exists.

Tetragon Version

CLI version: v1.0.1
Server version: v1.0.2 (installed via Helm)

Kernel Version

Linux ubuntu-noble 6.8.0-11-generic #11-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 14 00:29:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3

Bugtool

time="2024-04-03T23:20:06Z" level=info msg="saving init info"
time="2024-04-03T23:20:06Z" level=info msg="retrieving lib directory" libDir=/var/lib/tetragon/
time="2024-04-03T23:20:06Z" level=warning msg="not an object file, ignoring" path=/var/lib/tetragon/
time="2024-04-03T23:20:10Z" level=info msg="skipping metadata directory" path=/var/lib/tetragon/metadata
time="2024-04-03T23:20:10Z" level=warning msg="no btf filename in tetragon config, attempting to fall back to /sys/kernel/btf/vmlinux"
time="2024-04-03T23:20:11Z" level=info msg="btf file added" btfFname=/sys/kernel/btf/vmlinux
time="2024-04-03T23:20:11Z" level=info msg="tetragon log file added" exportFname=/var/run/cilium/tetragon/tetragon.log
time="2024-04-03T23:20:11Z" level=info msg="contacting metrics server" metricsAddr="http://localhost:2112/metrics"
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd=/bin/dmesg dstFname=dmesg.out ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev lo ingress" dstFname=tc-info.lo.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev lo egress" dstFname=tc-info.lo.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev eth0 ingress" dstFname=tc-info.eth0.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev eth0 egress" dstFname=tc-info.eth0.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethf85e1c33 ingress" dstFname=tc-info.vethf85e1c33.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethf85e1c33 egress" dstFname=tc-info.vethf85e1c33.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd2bc04e0 ingress" dstFname=tc-info.vethd2bc04e0.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd2bc04e0 egress" dstFname=tc-info.vethd2bc04e0.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd92333b8 ingress" dstFname=tc-info.vethd92333b8.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd92333b8 egress" dstFname=tc-info.vethd92333b8.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethc9f1bfea ingress" dstFname=tc-info.vethc9f1bfea.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethc9f1bfea egress" dstFname=tc-info.vethc9f1bfea.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdc8843f6 ingress" dstFname=tc-info.vethdc8843f6.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdc8843f6 egress" dstFname=tc-info.vethdc8843f6.egress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdd056db0 ingress" dstFname=tc-info.vethdd056db0.ingress ret=0
time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdd056db0 egress" dstFname=tc-info.vethdd056db0.egress ret=0
time="2024-04-03T23:20:12Z" level=info msg="executed command" cmd="/usr/bin/bpftool map show -j" dstFname=bpftool-maps.json ret=0
time="2024-04-03T23:20:12Z" level=info msg="executed command" cmd="/usr/bin/bpftool prog show -j" dstFname=bpftool-progs.json ret=0
time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/bpftool cgroup tree -j" dstFname=bpftool-cgroups.json ret=0
time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/gops stack localhost:8118" dstFname=gops.stack ret=0
time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/gops stats localhost:8118" dstFname=gpos.stats ret=0
time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/gops memstats localhost:8118" dstFname=gops.memstats ret=0
time="2024-04-03T23:20:13Z" level=info msg="dumped tracing policies in tracing-policies.json"

Relevant log output

No response

Anything else?

No response

Indeed, this is currently the case, i.e., the policy name should be unique across all other policies. I believe this also includes non-namespaced policies. This can be fixed, but it requires a significant amount of changes. My suggestion would be to use different policy names.

Is that the intended behavior? I can understand for non-namespaced policies, that policy names need to be unique. However it doesn't seem intuitive from a k8s perspective to not allow this behavior.

It's not the intended behavior, and I agree that it's counterintuitive.

Originally, Tetragon did not support namespaced policies so we used the policy name as a key, to uniquely identify a policy. When we introduced namespaced policies, this was not changed and we were left with the above limitation.

Internally, we maintain a mapping from a string (the policy name) to a collection:

tetragon/pkg/sensors/handler.go

Line 17 in bd63a46

collections map[string]collection

Which is the internal state we keep for each policy:

tetragon/pkg/sensors/collection.go

Lines 39 to 42 in bd63a46

    
           // collection is a collection of sensors 
        
           // This can either be creating from a tracing policy, or by loading sensors indepenently for sensors 
        
           // that are not loaded via a tracing policy (e.g., base sensor) and testing. 
        
           type collection struct {

Changing the code so that we something like the following for the key:

type collection_key struct {
    name, namespace string
}

should allow us to have the same policy name in different namespaces.

Would this be something the community would be interested in? I can contribute the change if it's not already being worked on.

Would this be something the community would be interested in? I can contribute the change if it's not already being worked on.

We 've discussed this in the community call yesterday (https://docs.google.com/document/d/1BFMJLdtisiCSLfMct0GHof_ioL-5QVNLEaeMSlk_7Eo/edit) and the consensus was that this is something the community would defintely be interested in.

I'm not aware of anyone working on it, and we would gladly take this contribution. Happy to also guide along the way.

Thanks!

@kkourt I created a draft PR here: #2337

The namespace policy does get separated:

[kind-tetragon-dev|kube-system] (base) ➜  ~ kubectl exec  ds/tetragon -c tetragon -- tetra tp list

ID   NAME                       STATE     FILTERID   NAMESPACE   SENSORS
2    file-monitoring-filtered   enabled   2          test        gkp-sensor-2
3    file-monitoring-filtered   enabled   3          test2       gkp-sensor-3

However, the policy doesn't seem to capture the events. Any clue as to where I should look?

@kkourt I created a draft PR here: #2337

The namespace policy does get separated:

[kind-tetragon-dev|kube-system] (base) ➜  ~ kubectl exec  ds/tetragon -c tetragon -- tetra tp list

ID   NAME                       STATE     FILTERID   NAMESPACE   SENSORS
2    file-monitoring-filtered   enabled   2          test        gkp-sensor-2
3    file-monitoring-filtered   enabled   3          test2       gkp-sensor-3

Cool, thanks!

However, the policy doesn't seem to capture the events. Any clue as to where I should look?

Does everything work as expected if the policies have different names?

@kkourt the policies do not seem to be enforced. I also don't see process_exit events as you normally should. Any suggestions where to look next?

@kkourt the policies do not seem to be enforced. I also don't see process_exit events as you normally should. Any suggestions where to look next?

So you mean that even if the policy names are not the same, the policies do not take effect?
Can you please open a separate issue for this? Please include a sysdump, the policies themselves, and what is it the expected and actual results of the policies.

@kkourt - Just did a sanity check, I rebuilt the codebase using the main branch without these code changes and indeed the sample policies are not taking in effect. Are there any known issues using WSL2? Otherwise, I will have to test in a different environment to confirm this behavior.

@kkourt - Just did a sanity check, I rebuilt the codebase using the main branch without these code changes and indeed the sample policies are not taking in effect. Are there any known issues using WSL2? Otherwise, I will have to test in a different environment to confirm this behavior.

I'm not sure about WSL2, but I wouldn't be surprised if there was an issue with it. Can you please create another issue with it? Should be possible to figure out what's wrong with a sysdump.

@kkourt created a separate issue here

@kkourt created a separate issue #2338

thanks!

Are there any known issues using WSL2? Otherwise, I will have to test in a different environment to confirm this behavior.

It seesm that WSL2 is not working properly. Would need to investigate further to figure out how to address the issue. In the meantime, would it be possible to use another environment (e.g., a normal linux VM) for testing? Thanks!

@kkourt will just be reinstalling my tools in a VM and continue testing there.

	// collection is a collection of sensors
	// This can either be creating from a tracing policy, or by loading sensors indepenently for sensors
	// that are not loaded via a tracing policy (e.g., base sensor) and testing.
	type collection struct {

Creating a TracingPolicyNamespaced with the same name for a different namespace does not get applied.