bug: Monitoring with Prometheus stack chart (can't get targets Etcd, Scheduler, Controller-manager )

Question

bug: Monitoring with Prometheus stack chart (can't get targets Etcd, Scheduler, Controller-manager )

aaktaev opened this issue 2 years ago · comments

Summary

I've installed rke2 with ansible, installed there prometheus-stack chart, in the targets - etcd, Scheduler, Controller-manager can't be reached

Issue Type

Bug Report

Ansible Version

ansible 2.10.17
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.8/dist-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0]



### Steps to Reproduce

Install rke2. 
Install prmetheus-stack chart
Check targets for monitoring in prometheus


### Expected Results

All services are reachable

### Actual Results

```console
can't get targets Etcd, Scheduler, Controller-manager

aaktaev · Answer 1 · Sat Nov 05 2022 00:00:24 GMT+0800 (China Standard Time)

From master node I can get metrics via:
curl -L http://localhost:2381/metrics | grep -v debugging

Michal Muransky · Answer 2 · Sat Nov 05 2022 01:27:54 GMT+0800 (China Standard Time)

HI @aaktaev i don't think that the issue you described is a bug of this Ansible Role. It looks more like a configuration issue. Scheduler nor control-manager does not expose their endpoints by default. Endpoint of etcd in RKE2 is also not exposed and you need to allow it by special option for rke2-server (i do not remember the name of that option but you can find int it RKE2 documentation).

Btw if you want to change some default rke2-server option which is not represented in this Role as a separate variable, you can use rke2_server_options variable instead.

Jurgen Goelen · Answer 3 · Tue Nov 08 2022 16:16:51 GMT+0800 (China Standard Time)

@aaktaev the problem is that the components you mention are only accessible via 127.0.0.1/locahost for security reasons. You probably want to use the rancher monitoring helm chart ... https://docs.ranchermanager.rancher.io/integrations-in-rancher/monitoring-and-alerting/how-monitoring-works

aaktaev · Answer 4 · Tue Nov 08 2022 16:31:51 GMT+0800 (China Standard Time)

@jgoelen , Hi,

I'm wondering how can I expose f.e. kube-proxy to be able get metrics by Prometheus? (without using push proxy)

Jurgen Goelen · Answer 5 · Tue Nov 08 2022 18:09:20 GMT+0800 (China Standard Time)

@aaktaev FYI: https://repo1.dso.mil/platform-one/big-bang/bigbang/-/issues/148 .... Personally I would not recommend the proposed solutions :-) The services our bound to localhost for security reasons!

Phuoc Hoang · Answer 6 · Tue Nov 22 2022 16:39:45 GMT+0800 (China Standard Time)

hi @jgoelen,
Do you have the example of values.yaml of rancher monitoring's helm chart? It's hard for me to find any document about kube-prometheus-stack's distro.

Jurgen Goelen · Answer 7 · Tue Nov 22 2022 17:31:15 GMT+0800 (China Standard Time)

@hoangphuocbk we don't have a satisfying solution (yet 😁) for getting alerts for ETCD, kube-proxy, etc. Therefore we temporarily disabled the alerts.

We use the following Helm chart config:

Chart.yaml

apiVersion: v2
appVersion: "1.0"
description: A Monitoring Helm chart for RKE2
name: monitoring
version: 0.1.0
dependencies:
    - name: kube-prometheus-stack
      version: "41.5.1"
      repository: "https://prometheus-community.github.io/helm-charts"

values.yaml

kube-prometheus-stack:
  namespaceOverride: monitoring
  alertmanager:
    config:
      global:
        resolve_timeout: 5m
        slack_api_url: https://hooks.slack.com/services/******
      inhibit_rules:
        - source_matchers:
            - 'severity = critical'
          target_matchers:
            - 'severity =~ warning|info'
          equal:
            - 'namespace'
            - 'alertname'
        - source_matchers:
            - 'severity = warning'
          target_matchers:
            - 'severity = info'
          equal:
            - 'namespace'
            - 'alertname'
        - source_matchers:
            - 'alertname = InfoInhibitor'
          target_matchers:
            - 'severity = info'
          equal:
            - 'namespace'
      route:
        group_by: ['namespace']
        group_wait: 30s
        group_interval: 5m
        repeat_interval: 12h
        receiver: 'null'
        routes:
        - receiver: 'null'
          matchers:
            - alertname =~ "InfoInhibitor|Watchdog"
        - match:
          receiver: 'slack'
          continue: true
      receivers:
      - name: 'null'
      - name: 'slack'
        slack_configs:
        - channel: '#dev-cluster-alerts'
          send_resolved: false
          title: 'DEV cluster [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] Alert'
          text: >-
            {{ range .Alerts }}
              *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
              *Description:* {{ .Annotations.description }}
              *Details:*
              {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
              {{ end }}
            {{ end }}
      templates:
      - '/etc/alertmanager/config/*.tmpl'
  grafana:
    namespaceOverride: monitoring
    admin:
      existingSecret: grafana-admin-secrets
    ingress:
      enabled: true
      hosts:
        - monitoring-dev.myorg.be
      tls:
        - hosts:
          - monitoring-dev.myorg.be
  prometheus-node-exporter:
    namespaceOverride: monitoring
    tolerations:
      - effect: NoSchedule
        key: dedicated
        operator: Exists
      - effect: NoExecute
        key: CriticalAddonsOnly
        operator: Exists
  kube-state-metrics:
    namespaceOverride: monitoring
  #
  # The following monitoring features are tmp disabled:
  #
  defaultRules:
    rules:
      etcd: false
      kubeControllerManager: false
      kubeSchedulerAlerting: false
      kubeSchedulerRecording: false
      kubeProxy: false
  kubeEtcd:
    enabled: false
    service:
      enabled: false
  kubeScheduler:
    enabled: false
    service:
      enabled: false
  kubeControllerManager:
    enabled: false
    service:
      enabled: false
  kubeProxy:
    enabled: false
    service:
      enabled: false

Phuoc Hoang · Answer 8 · Thu Dec 01 2022 21:54:12 GMT+0800 (China Standard Time)

@aaktaev, you can add these options

    rke2_server_options:
      - "kube-controller-manager-arg: ['bind-address=0.0.0.0']"
      - "kube-scheduler-arg: ['bind-address=0.0.0.0']"
      - "kube-proxy-arg: ['metrics-bind-address=0.0.0.0:10249']"
      - "etcd-expose-metrics: true"

aaktaev · Answer 9 · Thu Dec 01 2022 22:11:04 GMT+0800 (China Standard Time)

Thanks,a lready did it via:

rke2_server_options:
  - "kube-proxy-arg: --metrics-bind-address=0.0.0.0"


# (Optional) Additional RKE2 agent configuration options
# You could find the flags at https://docs.rke2.io/install/install_options/install_options/#configuring-linux-rke2-agent-nodes
# rke2_agent_options:
#   - "option: value"
rke2_agent_options:
  - "kube-proxy-arg: --metrics-bind-address=0.0.0.0"