prometheus-operator / prometheus-operator

Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes

Home Page:https://prometheus-operator.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add documentation for kubernetesSDConfigs usage to scrape node targets

oleksii-kalinin opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Description

When I configure ScrapeConfig to scrape nodes or cadvisor metrics, the target returns a 403 error.

Steps to Reproduce

Use manifests attached to the issue

Expected Result

According to the documentation, prometheus should use the default in-cluster token and ca to communicate with the API

Actual Result

server returned HTTP status 403 Forbidden

Prometheus Operator Version

v0.73.0

Kubernetes Version

v1.28.7-eks-b9c9ed7

Kubernetes Cluster Type

EKS

How did you deploy Prometheus-Operator?

prometheus-operator/kube-prometheus

Manifests

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: prometheus
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
  name: cadvisor
  labels:
    prometheus: system-monitoring-prometheus
spec:
  scheme: HTTPS
  relabelings:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - targetLabel: __address__
      replacement: kubernetes.default.svc:443
    - sourceLabels: [__meta_kubernetes_node_name]
      regex: (.+)
      targetLabel: __metrics_path__
      replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
  kubernetesSDConfigs:
  - role: Node

prometheus-operator log output

level=info ts=2024-04-16T07:02:40.311852686Z caller=main.go:186 msg="Starting Prometheus Operator" version="(version=0.73.0, branch=refs/tags/v0.73.0, revision=d70313bd17cf2a4b911222062608f793be146548)"
level=info ts=2024-04-16T07:02:40.311897365Z caller=main.go:187 build_context="(go=go1.22.1, platform=linux/amd64, user=Action-Run-ID-8551873288, date=20240404-08:50:01, tags=unknown)"
level=info ts=2024-04-16T07:02:40.311908351Z caller=main.go:198 msg="namespaces filtering configuration " config="{allow_list=\"\",deny_list=\"\",prometheus_allow_list=\"\",alertmanager_allow_list=\"\",alertmanagerconfig_allow_list=\"\",thanosruler_allow_list=\"\"}"
level=info ts=2024-04-16T07:02:40.407398652Z caller=main.go:227 msg="connection established" cluster-version=v1.28.7-eks-b9c9ed7
level=info ts=2024-04-16T07:02:40.510754118Z caller=operator.go:335 component=prometheus-controller msg="Kubernetes API capabilities" endpointslices=true
level=info ts=2024-04-16T07:02:40.527533967Z caller=operator.go:320 component=prometheusagent-controller msg="Kubernetes API capabilities" endpointslices=true
level=info ts=2024-04-16T07:02:40.704777786Z caller=server.go:298 msg="starting insecure server" address=:8080
level=info ts=2024-04-16T07:02:41.705515824Z caller=operator.go:429 component=prometheusagent-controller msg="successfully synced all caches"
level=info ts=2024-04-16T07:02:41.705725357Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:02:41.805364487Z caller=operator.go:283 component=thanos-controller msg="successfully synced all caches"
level=info ts=2024-04-16T07:02:42.00562483Z caller=operator.go:313 component=alertmanager-controller msg="successfully synced all caches"
level=info ts=2024-04-16T07:02:42.005642809Z caller=operator.go:392 component=prometheus-controller msg="successfully synced all caches"
level=info ts=2024-04-16T07:02:42.011162612Z caller=operator.go:766 component=prometheus-controller key=prometheus/prometheus msg="sync prometheus"
level=info ts=2024-04-16T07:02:42.151025532Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:02:42.304841339Z caller=operator.go:766 component=prometheus-controller key=prometheus/prometheus msg="sync prometheus"
level=info ts=2024-04-16T07:03:30.565074028Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:03:40.005947279Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:03:40.010506182Z caller=operator.go:766 component=prometheus-controller key=prometheus/prometheus msg="sync prometheus"
level=info ts=2024-04-16T07:17:18.345594596Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:17:21.582209642Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"
level=info ts=2024-04-16T07:17:22.445630324Z caller=operator.go:563 component=prometheusagent-controller key=prometheus/prometheus-agent msg="sync prometheus"

Anything else?

Before operator, I configure scrape to use SA token and CA directly with


      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

So I can't do it with the operator b/c no such options for the scrape configs.

You would need to create service account token secret example if prometheus service account name is prometheus

apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
 name: prometheus-secret
 annotations:
  kubernetes.io/service-account.name: "prometheus"

Also create secret for TLS config and use secret selector to select the configs

Example

apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
  name: scrape-config-kubernetes-sd-example
  namespace: default
  labels:
    app.kubernetes.io/name: scrape-config-kubernetes-sd-example
spec:
  scheme: HTTPS
  authorization:
    credentials:
      name: prometheus-secret
      key: token
  tlsConfig:
    ca:
      secret:
        name: default-server
        key: ca.crt
    insecureSkipVerify: true
  kubernetesSDConfigs:
  - role: Node

Ok, it'll probably work, however, it's not the way described in the docs.

Thanks @slashpai for the hint with the API token Secrets!

We tried to switch the Strimzi additional scrape config example to the new ScrapeConfig CR.

Additionally to the bearer token, we also used the ca.crt from the API token Secret.
There was no need to add the insecureSkipVerify anymore.

One of the resulting ScrapeConfigs looks now like this:

apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
  name: kubernetes-cadvisor
  labels:
    prometheus: prometheus
spec:
  ...
  authorization:
    credentials:
      name: prometheus-secret
      key: token
  ...
  tlsConfig:
    ca:
      secret:
        name: prometheus-secret
        key: ca.crt
  relabelings:
  ...
  metricRelabelings:
  ...

Since we want to avoid long-living API tokens, we decided to introduce a Kyverno CleanupPolicy, which removes the token based on a schedule:

apiVersion: kyverno.io/v2beta1
kind: CleanupPolicy
metadata:
  name: remove-api-token
spec:
  match:
    any:
    - resources:
        kinds:
        - Secret
        names:
        - prometheus-secret
  schedule: "<cron schedule>"

Our ArgoCD will recreate the Secret afterwards.