enforcedNamespaceLabel: don't override existing vector namespace selectors
msw-kialo opened this issue · comments
Component(s)
Prometheus, PrometheusRule
What is missing? Please describe.
TLDR: option to let enforcedNamespaceLabel
leave metrics expression that already have the label selector unchanged (expressions like kube_pod_labels{} [...] kube_node_labels{namespace="cluster-wide-exporter"}
are only transformed to kube_pod_labels{namespace="resource-namespace"} [...] kube_node_labels{namespace="cluster-wide-exporter"}
)
Background:
The enforcedNamespaceLabel
feature can only be excluded on a per resource base.
However, occasionally, applications might want to incorporate some cluster-global metrics (like kube_node_labels
or external metrics from e.g. AWS CloudWatch).
At the moment, the only option is to ignore these alerts completely (via excludedFromEnforce
).
This is error-prone: as all other metrics needs to be rewritten manually and also the output label must potentially be added manually.
Furthermore, it might be complicated to manage the exclusion list: requires listen all affected namespaces explicitly.
Ideally, it would be possible to allow PromQL vector expressions that already have the wanted namespace (but a different value) to be left unchanged (selector is only injected but does not override).
It softens the multi-tendency feature: an option to enable it either globally or per namespace/resource like excludedFromEnforce
is probably good.
However, this softening is presumable acceptable (at least for some organizations like ours):
- It is already possible to query arbitrary metrics
- The key feature is to ensure generated alerts/metrics have the desired label (so alerts are limited to the application's namespace)
- it is still better than ignoring resources completely.
Describe alternatives you've considered.
Alternatively, it would be nice if the namespace
field in excludedFromEnforce
could be made optional: it would allow us to aggregate alerts that need global metrics into a specially named resource (independently of the specific namespace an application is installed).
Environment Information.
Environment
Kubernetes Version: v1.27.10
Prometheus-Operator Version: 0.71.2
I am open to contribute the needed changes myself.
At the moment, the only option is to ignore these alerts completely (via excludedFromEnforce).
This is error-prone: as all other metrics needs to be rewritten manually and also the output label must potentially be added manually.
So "excludedFromEnforce" is still a viable option?
I see a risk with your suggestion that the modified PromQL expression returns no result because joins wouldn't match and it becomes harder for users to find out.
What if (as a first step) we provide a tool which can rewrite PromQL expressions but wouldn't overwrite existing namespace label matchers.
Alternatively, it would be nice if the namespace field in excludedFromEnforce could be made optional: it would allow us to aggregate alerts that need global metrics into a specially named resource (independently of the specific namespace an application is installed).
Sorry, I don't understand this part.
So "excludedFromEnforce" is still a viable option?
It works. But is feels more like a workaround.
For two reasons:
- The alerts created based on alert rules might not have a namespace label:
The enforcedNamespaceLabel
features also ensures created alerts have a label to indicate the source namespace.
If the PrometheusRule
is excluded, the K8s users must ensure this manually (complicated and error-prone).
- It is complicated to exclude resources in arbitrary namespaces:
Alternatively, it would be nice if the namespace field in excludedFromEnforce could be made optional: it would allow us to aggregate alerts that need global metrics into a specially named resource (independently of the specific namespace an application is installed).
Sorry, I don't understand this part.
Our core application is installed multiple times in clusters (within various namespaces).
excludedFromEnforce
requires that each object reference includes a namespace
value.
This requires the globally installed prometheus-operator
to be adapted each time to ensure we add an excludedFromEnforce
entry for the specific namespace.
Ideally, we would be able to specify something like:
- resource: prometheusrules
name: global-alerts
So in whatever namespace the application is installed, it is allowed to define a few alerts that uses global vectors in its expressions.
(regex namespace filter would be even better, but leaving it out would be sufficient).
What if (as a first step) we provide a tool which can rewrite PromQL expressions but wouldn't overwrite existing namespace label matchers.
"a tool" means as standalone tool? Yes, we should be able to incorporate such a tool into our deployment process (but a prometheus-operator option would be much simpler).
To be clear, I don't mean to change default behavior when enforcedNamespaceLabel
is used.
I imagine an option to allow users to enable this soften expression adaption if needed and wanted.
"a tool" means as standalone tool? Yes, we should be able to incorporate such a tool into our deployment process (but a prometheus-operator option would be much simpler).
Correct this would be my short-term proposal.
To be clear, I don't mean to change default behavior when enforcedNamespaceLabel is used.
I imagine an option to allow users to enable this soften expression adaption if needed and wanted.
Understood.
My main worries are
- The additional complexity at the API level: every new option increases cognitive load on the maintainers and users.
- The current approach shouldn't create side effects with joins. With the new approach, it seems harder to be sure that the modified queries return what you expect.
cc @prometheus-operator/prometheus-operator-reviewers who might have other ideas :)