cilium / tetragon

eBPF-based Security Observability and Runtime Enforcement

Home Page:https://tetragon.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Delete metrics with policy label on policy delete

lambdanis opened this issue · comments

tetragon_policy_events_total metric has a policy label. When a TracingPolicy is deleted, the corresponding metrics are still exposed by Tetragon, leading to an overhead (usually not big, as policies are not churned frequently, but might be significant in some cases).

There was a similar problem with metrics that have a pod label. Now when a pod is deleted, Tetragon deletes its metrics from the registry (see https://github.com/cilium/tetragon/blob/main/pkg/metrics/metricwithpod.go).

We should add a similar handler for policy delete events. One small gotcha is that at the moment metrics don't distinguish between clusterwide and namespaced policies, so we should first add another label to distinguish between different policy kinds.

Hello @lambdanis
Just to get a proper understanding of the issue.

We are talking about the Policy_events_total metric. which does not have any Pod Delete Handler like this, registered in main. which makes sure that when a tetragon policy is deleted, corresponding metrics are also deleted.

So, to resolve this, following things need to be done:

  • Add a new label, that can tell if policy is cluster wide or namespaced. (what do you suggest the name of that label be ?)
  • Implement and Register handler, that will make sure tracing policy metrics are also deleted when the policy itself is deleted.

If I got it right, what is the deadline for resolution of this issue ? I would like to work on it.

Hi @prateek041

Add a new label, that can tell if policy is cluster wide or namespaced. (what do you suggest the name of that label be ?)

Yes, maybe "policy_kind" and "policy_namespace" (empty for clusterwide) would be good.

Implement and Register handler, that will make sure tracing policy metrics are also deleted when the policy itself is deleted.

Something similar. I think metrics deletion logic should be triggered from DeleteTracingPolicy in the sensors manager rather than from the k8s API watcher (WatchTracePolicy). That way stale metrics will be cleaned up also if policies are created via Tetragon API rather than via k8s CRD.

If I got it right, what is the deadline for resolution of this issue ? I would like to work on it.

No deadline. I don't recall any reports of this being a big problem, so it's just a nice to have. Feel free to open a PR or if anything is unclear then post here.

Sure, thanks for the pointers, I have started working on it. Please assign it to me.