prometheus-operator / prometheus-operator

Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes

Home Page:https://prometheus-operator.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

enforcedNamespaceLabel: don't override existing vector namespace selectors

msw-kialo opened this issue · comments

Component(s)

Prometheus, PrometheusRule

What is missing? Please describe.

TLDR: option to let enforcedNamespaceLabel leave metrics expression that already have the label selector unchanged (expressions like kube_pod_labels{} [...] kube_node_labels{namespace="cluster-wide-exporter"} are only transformed to kube_pod_labels{namespace="resource-namespace"} [...] kube_node_labels{namespace="cluster-wide-exporter"})

Background:
The enforcedNamespaceLabel feature can only be excluded on a per resource base.
However, occasionally, applications might want to incorporate some cluster-global metrics (like kube_node_labels or external metrics from e.g. AWS CloudWatch).

At the moment, the only option is to ignore these alerts completely (via excludedFromEnforce).
This is error-prone: as all other metrics needs to be rewritten manually and also the output label must potentially be added manually.
Furthermore, it might be complicated to manage the exclusion list: requires listen all affected namespaces explicitly.

Ideally, it would be possible to allow PromQL vector expressions that already have the wanted namespace (but a different value) to be left unchanged (selector is only injected but does not override).

It softens the multi-tendency feature: an option to enable it either globally or per namespace/resource like excludedFromEnforce is probably good.
However, this softening is presumable acceptable (at least for some organizations like ours):

  • It is already possible to query arbitrary metrics
  • The key feature is to ensure generated alerts/metrics have the desired label (so alerts are limited to the application's namespace)
  • it is still better than ignoring resources completely.

Describe alternatives you've considered.

Alternatively, it would be nice if the namespace field in excludedFromEnforce could be made optional: it would allow us to aggregate alerts that need global metrics into a specially named resource (independently of the specific namespace an application is installed).

Environment Information.

Environment

Kubernetes Version: v1.27.10
Prometheus-Operator Version: 0.71.2

I am open to contribute the needed changes myself.

At the moment, the only option is to ignore these alerts completely (via excludedFromEnforce).
This is error-prone: as all other metrics needs to be rewritten manually and also the output label must potentially be added manually.

So "excludedFromEnforce" is still a viable option?
I see a risk with your suggestion that the modified PromQL expression returns no result because joins wouldn't match and it becomes harder for users to find out.
What if (as a first step) we provide a tool which can rewrite PromQL expressions but wouldn't overwrite existing namespace label matchers.

Alternatively, it would be nice if the namespace field in excludedFromEnforce could be made optional: it would allow us to aggregate alerts that need global metrics into a specially named resource (independently of the specific namespace an application is installed).

Sorry, I don't understand this part.

So "excludedFromEnforce" is still a viable option?

It works. But is feels more like a workaround.

For two reasons:

  1. The alerts created based on alert rules might not have a namespace label:

The enforcedNamespaceLabel features also ensures created alerts have a label to indicate the source namespace.

If the PrometheusRule is excluded, the K8s users must ensure this manually (complicated and error-prone).

  1. It is complicated to exclude resources in arbitrary namespaces:

Alternatively, it would be nice if the namespace field in excludedFromEnforce could be made optional: it would allow us to aggregate alerts that need global metrics into a specially named resource (independently of the specific namespace an application is installed).

Sorry, I don't understand this part.

Our core application is installed multiple times in clusters (within various namespaces).
excludedFromEnforce requires that each object reference includes a namespace value.
This requires the globally installed prometheus-operator to be adapted each time to ensure we add an excludedFromEnforce entry for the specific namespace.

Ideally, we would be able to specify something like:

- resource: prometheusrules
  name: global-alerts

So in whatever namespace the application is installed, it is allowed to define a few alerts that uses global vectors in its expressions.

(regex namespace filter would be even better, but leaving it out would be sufficient).

What if (as a first step) we provide a tool which can rewrite PromQL expressions but wouldn't overwrite existing namespace label matchers.

"a tool" means as standalone tool? Yes, we should be able to incorporate such a tool into our deployment process (but a prometheus-operator option would be much simpler).

To be clear, I don't mean to change default behavior when enforcedNamespaceLabel is used.
I imagine an option to allow users to enable this soften expression adaption if needed and wanted.

"a tool" means as standalone tool? Yes, we should be able to incorporate such a tool into our deployment process (but a prometheus-operator option would be much simpler).

Correct this would be my short-term proposal.

To be clear, I don't mean to change default behavior when enforcedNamespaceLabel is used.
I imagine an option to allow users to enable this soften expression adaption if needed and wanted.

Understood.

My main worries are

  • The additional complexity at the API level: every new option increases cognitive load on the maintainers and users.
  • The current approach shouldn't create side effects with joins. With the new approach, it seems harder to be sure that the modified queries return what you expect.

cc @prometheus-operator/prometheus-operator-reviewers who might have other ideas :)