Delete metrics with policy label on policy delete

Question

Delete metrics with policy label on policy delete

lambdanis opened this issue 6 months ago · comments

tetragon_policy_events_total metric has a policy label. When a TracingPolicy is deleted, the corresponding metrics are still exposed by Tetragon, leading to an overhead (usually not big, as policies are not churned frequently, but might be significant in some cases).

There was a similar problem with metrics that have a pod label. Now when a pod is deleted, Tetragon deletes its metrics from the registry (see https://github.com/cilium/tetragon/blob/main/pkg/metrics/metricwithpod.go).

We should add a similar handler for policy delete events. One small gotcha is that at the moment metrics don't distinguish between clusterwide and namespaced policies, so we should first add another label to distinguish between different policy kinds.

prateek singh · Answer 1 · Tue Apr 23 2024 01:01:57 GMT+0800 (China Standard Time)

Hello @lambdanis
Just to get a proper understanding of the issue.

We are talking about the Policy_events_total metric. which does not have any Pod Delete Handler like this, registered in main. which makes sure that when a tetragon policy is deleted, corresponding metrics are also deleted.

So, to resolve this, following things need to be done:

Add a new label, that can tell if policy is cluster wide or namespaced. (what do you suggest the name of that label be ?)
Implement and Register handler, that will make sure tracing policy metrics are also deleted when the policy itself is deleted.

If I got it right, what is the deadline for resolution of this issue ? I would like to work on it.

Anna Kapuścińska · Answer 2 · Wed Apr 24 2024 07:48:56 GMT+0800 (China Standard Time)

Hi @prateek041

Add a new label, that can tell if policy is cluster wide or namespaced. (what do you suggest the name of that label be ?)

Yes, maybe "policy_kind" and "policy_namespace" (empty for clusterwide) would be good.

Implement and Register handler, that will make sure tracing policy metrics are also deleted when the policy itself is deleted.

Something similar. I think metrics deletion logic should be triggered from DeleteTracingPolicy in the sensors manager rather than from the k8s API watcher (WatchTracePolicy). That way stale metrics will be cleaned up also if policies are created via Tetragon API rather than via k8s CRD.

If I got it right, what is the deadline for resolution of this issue ? I would like to work on it.

No deadline. I don't recall any reports of this being a big problem, so it's just a nice to have. Feel free to open a PR or if anything is unclear then post here.

prateek singh · Answer 3 · Wed Apr 24 2024 18:06:36 GMT+0800 (China Standard Time)

Sure, thanks for the pointers, I have started working on it. Please assign it to me.