prometheus-operator / prometheus-operator

Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes

Home Page:https://prometheus-operator.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Remote Write enablement

07Rajat opened this issue · comments

What happened?

Description

We are looking for a solution on remote write feature enablement.

In our case, We have multiple openshift clusters and we are trying to centralize these under one grafana dashboard.

image

In above image we could see there are 2 clusters, cluster 1 and cluster2 where we have prometheus installed in different namespaces. customized prometheus-operator installed in one namespace and another comes default with openshift itself and which is present under openshift-monitoring namespace.

Here, we are trying to remote_write the date from default prometheus from openshift-monitoring to customized promethues server.

In customized prometheus, promethus installed via prometheus as a separate prometheus object and exposing the prometheus service.

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  podMonitorSelector: {}
  resources:
    requests:
      memory: 400Mi

apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  type: NodePort
  ports:
  - name: web
    nodePort: 30900
    port: 9090
    protocol: TCP
    targetPort: web
  selector:
    prometheus: prometheus

https://blog.container-solutions.com/prometheus-operator-beginners-guide

https://grafana.com/blog/2023/01/19/how-to-monitor-kubernetes-clusters-with-the-prometheus-operator/

Here, we are trying to customize the prometheus yaml configuration however it is not allowing us to change or modify anything in the statefulset which generates post deployment of prometheus.

we are looking for an option where we could add the remote write configuration as configmap and mount that as volume in customize prometheus configuration.

Configmap for reference :

kind: ConfigMap
apiVersion: v1
metadata:
  name: cluster-monitoring-config
  namespace: test
  labels:
    hive.openshift.io/managed: 'true'
data:
  config.yaml: |
    enableUserWorkload: true
    prometheusK8s:
      remoteWrite:
      - url: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091/api/v1/write
        oauth2:
          clientId:
            secret:
              key: client-id
              name: observatorium-credentials
          clientSecret:
            key: client-secret
            name: observatorium-credentials
          tokenUrl: https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
        remoteTimeout: 30s
        writeRelabelConfigs:
        - sourceLabels:
          - __name__
          action: keep
          regex: (addon_operator_addons_count|addon_operator_reconcile_error|addon_operator_addon_health_info|addon_operator_ocm_api_requests_durations|addon_operator_ocm_api_requests_durations_sum|addon_operator_ocm_api_requests_durations_count|addon_operator_paused|cluster_admin_enabled|limited_support_enabled|identity_provider|cpms_enabled|ingress_canary_route_reachable|ocm_agent_service_log_sent_total|sre:slo:probe_success_api|sre:slo:probe_success_console|sre:slo:upgradeoperator_upgrade_result|sre:slo:imageregistry_http_requests_total|sre:slo:oauth_server_requests_total|sre:sla:outage_5_minutes|sre:slo:apiserver_28d_slo|sre:slo:console_28d_slo|sre:error_budget_burn:apiserver_28d_slo|sre:error_budget_burn:console_28d_slo|sre:operators:succeeded)
        queueConfig:
          capacity: 2500
          maxShards: 1000
          minShards: 1
          maxSamplesPerSend: 2000
          batchSendDeadline: 60s
          minBackoff: 30ms
          maxBackoff: 1m
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
      retention: 11d
      retentionSize: 90GB
      volumeClaimTemplate:
        metadata:
          name: prometheus-data
        spec:
          resources:
            requests:
              storage: 100Gi
    alertmanagerMain:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
      volumeClaimTemplate:
        metadata:
          name: alertmanager-data
        spec:
          resources:
            requests:
              storage: 10Gi
    telemeterClient:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
      telemeterServerURL: https://infogw.api.openshift.com
    prometheusOperator:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    grafana:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    k8sPrometheusAdapter:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    kubeStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    openshiftStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    thanosQuerier:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists
    monitoringPlugin:
      nodeSelector:
        node-role.kubernetes.io/infra: ''
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        operator: Exists

Really appreciated your suggestions and support.

Prometheus Operator Version

openshiftVersion: 4.13.29
kustomizeVersion: v4.5.4

Kubernetes Version

openshiftVersion: 4.13.29
kustomizeVersion: v4.5.4

Kubernetes Cluster Type

OpenShift

How did you deploy Prometheus-Operator?

Other (please comment)

Manifests

No response

prometheus-operator log output

Prometheus Operator 0.56.3 provided by Craig Trought

Anything else?

No response

If you are running Prometheus-Operator, you can specify the remote write config in the corresponding Promtheus CR itself. See here. Unless you have a very specific reason to use the config map, maybe this will help?

In case you do want to use the config map, something like additionalScrapeConfigs lets you write your own config (incase some fields are not yet supported) in a secret and reference it in the Prometheus CR.

I'm not sure to understand your issue. The right way to configure the OCP Prometheus is via the CMO configmap though I'm not sure why you have https://thanos-querier.openshift-monitoring.svc.cluster.local:9091/api/v1/write as the remote-write endpoint.

Hi @mviswanathsai , @simonpasquier
For prometheus deployment

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus-operator
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  podMonitorSelector: {}
  resources:
    requests:
      memory: 400Mi

And to add on this for remote_write url

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus-operator
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  podMonitorSelector: {}
  resources:
    requests:
      memory: 400Mi
  remoteWrite:
    - url: https://prometheus-k8s.openshift-monitoring.svc.cluster.local:9091/api/v1/write
      oauth2:
        clientId:
          secret:
            key: client-id
            name: observatorium-credentials
        clientSecret:
          key: client-secret
          name: observatorium-credentials
        tokenUrl: https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
      remoteTimeout: 30s
      writeRelabelConfigs:
      - sourceLabels:
        - __name__
        action: keep
        regex: (addon_operator_addons_count|addon_operator_reconcile_error|addon_operator_addon_health_info|addon_operator_ocm_api_requests_durations|addon_operator_ocm_api_requests_durations_sum|addon_operator_ocm_api_requests_durations_count|addon_operator_paused|cluster_admin_enabled|limited_support_enabled|identity_provider|cpms_enabled|ingress_canary_route_reachable|ocm_agent_service_log_sent_total|sre:slo:probe_success_api|sre:slo:probe_success_console|sre:slo:upgradeoperator_upgrade_result|sre:slo:imageregistry_http_requests_total|sre:slo:oauth_server_requests_total|sre:sla:outage_5_minutes|sre:slo:apiserver_28d_slo|sre:slo:console_28d_slo|sre:error_budget_burn:apiserver_28d_slo|sre:error_budget_burn:console_28d_slo|sre:operators:succeeded)

seems prometheus object is restricting to create the statefulset with remote_write option, could you please suggest

seems prometheus object is restricting to create the statefulset with remote_write option, could you please suggest

There's no such restriction. I would check the status field of the Prometheus object and the prometheus-operator logs.

Prometheus Operator 0.56.3

This is a very old version. I'd advise to upgrade.

Prometheus Operator 0.56.3

This is a very old version. I'd advise to upgrade.

Thanks for the advice @simonpasquier but I believe this is the latest prometheus operator version which is available in the Redhat Marketplace
image