Add documentation for kubernetesSDConfigs usage to scrape node targets
oleksii-kalinin opened this issue · comments
Is there an existing issue for this?
- I have searched the existing issues
What happened?
Description
When I configure ScrapeConfig
to scrape nodes or cadvisor metrics, the target returns a 403 error.
Steps to Reproduce
Use manifests attached to the issue
Expected Result
According to the documentation, prometheus should use the default in-cluster token and ca to communicate with the API
Actual Result
server returned HTTP status 403 Forbidden
Prometheus Operator Version
v0.73.0
Kubernetes Version
v1.28.7-eks-b9c9ed7
Kubernetes Cluster Type
EKS
How did you deploy Prometheus-Operator?
prometheus-operator/kube-prometheus
Manifests
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: prometheus
namespace: prometheus
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
name: cadvisor
labels:
prometheus: system-monitoring-prometheus
spec:
scheme: HTTPS
relabelings:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- targetLabel: __address__
replacement: kubernetes.default.svc:443
- sourceLabels: [__meta_kubernetes_node_name]
regex: (.+)
targetLabel: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
kubernetesSDConfigs:
- role: Node
prometheus-operator log output
level=info ts=2024-04-16T07:02:40.311852686Z caller=main.go:186 msg="Starting Prometheus Operator" version="(version=0.73.0, branch=refs/tags/v0.73.0, revision=d70313bd17cf2a4b911222062608f793be146548)"
level=info ts=2024-04-16T07:02:40.311897365Z caller=main.go:187 build_context="(go=go1.22.1, platform=linux/amd64, user=Action-Run-ID-8551873288, date=20240404-08:50:01, tags=unknown)"
level=info ts=2024-04-16T07:02:40.311908351Z caller=main.go:198 msg="namespaces filtering configuration " config="{allow_list=\"\",deny_list=\"\",prometheus_allow_list=\"\",alertmanager_allow_list=\"\",alertmanagerconfig_allow_list=\"\",thanosruler_allow_list=\"\"}"
level=info ts=2024-04-16T07:02:40.407398652Z caller=main.go:227 msg="connection established" cluster-version=v1.28.7-eks-b9c9ed7
level=info ts=2024-04-16T07:02:40.510754118Z caller=operator.go:335 component=prometheus-controller msg="Kubernetes API capabilities" endpointslices=true
level=info ts=2024-04-16T07:02:40.527533967Z caller=operator.go:320 component=prometheusagent-controller msg="Kubernetes API capabilities" endpointslices=true
level=info ts=2024-04-16T07:02:40.704777786Z caller=server.go:298 msg="starting insecure server" address=:8080
level=info ts=2024-04-16T07:02:41.705515824Z caller=operator.go:429 component=prometheusagent-controller msg="successfully synced all caches"
level=info ts=2024-04-16T07:02:41.705725357Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:02:41.805364487Z caller=operator.go:283 component=thanos-controller msg="successfully synced all caches"
level=info ts=2024-04-16T07:02:42.00562483Z caller=operator.go:313 component=alertmanager-controller msg="successfully synced all caches"
level=info ts=2024-04-16T07:02:42.005642809Z caller=operator.go:392 component=prometheus-controller msg="successfully synced all caches"
level=info ts=2024-04-16T07:02:42.011162612Z caller=operator.go:766 component=prometheus-controller key=prometheus/prometheus msg="sync prometheus"
level=info ts=2024-04-16T07:02:42.151025532Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:02:42.304841339Z caller=operator.go:766 component=prometheus-controller key=prometheus/prometheus msg="sync prometheus"
level=info ts=2024-04-16T07:03:30.565074028Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:03:40.005947279Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:03:40.010506182Z caller=operator.go:766 component=prometheus-controller key=prometheus/prometheus msg="sync prometheus"
level=info ts=2024-04-16T07:17:18.345594596Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:17:21.582209642Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:17:22.445630324Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
Anything else?
Before operator, I configure scrape to use SA token and CA directly with
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
So I can't do it with the operator b/c no such options for the scrape configs.
You would need to create service account token secret example if prometheus service account name is prometheus
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: prometheus-secret
annotations:
kubernetes.io/service-account.name: "prometheus"
Also create secret for TLS config and use secret selector to select the configs
Example
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
name: scrape-config-kubernetes-sd-example
namespace: default
labels:
app.kubernetes.io/name: scrape-config-kubernetes-sd-example
spec:
scheme: HTTPS
authorization:
credentials:
name: prometheus-secret
key: token
tlsConfig:
ca:
secret:
name: default-server
key: ca.crt
insecureSkipVerify: true
kubernetesSDConfigs:
- role: Node
Ok, it'll probably work, however, it's not the way described in the docs.
ya adding kubernetesSDConfig
example in https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/scrapeconfig.md should be helpful
Referring to https://github.com/slashpai/prometheus-operator-examples/tree/main/scrape_config/kubernetes_sd may be helpful for some examples for time being.
Thanks @slashpai for the hint with the API token Secrets!
We tried to switch the Strimzi additional scrape config example to the new ScrapeConfig CR.
Additionally to the bearer token, we also used the ca.crt
from the API token Secret.
There was no need to add the insecureSkipVerify
anymore.
One of the resulting ScrapeConfigs looks now like this:
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
name: kubernetes-cadvisor
labels:
prometheus: prometheus
spec:
...
authorization:
credentials:
name: prometheus-secret
key: token
...
tlsConfig:
ca:
secret:
name: prometheus-secret
key: ca.crt
relabelings:
...
metricRelabelings:
...
Since we want to avoid long-living API tokens, we decided to introduce a Kyverno CleanupPolicy, which removes the token based on a schedule:
apiVersion: kyverno.io/v2beta1
kind: CleanupPolicy
metadata:
name: remove-api-token
spec:
match:
any:
- resources:
kinds:
- Secret
names:
- prometheus-secret
schedule: "<cron schedule>"
Our ArgoCD will recreate the Secret afterwards.