bug: Monitoring with Prometheus stack chart (can't get targets Etcd, Scheduler, Controller-manager )
aaktaev opened this issue · comments
Summary
I've installed rke2 with ansible, installed there prometheus-stack chart, in the targets - etcd, Scheduler, Controller-manager can't be reached
Issue Type
Bug Report
Ansible Version
ansible 2.10.17
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.8/dist-packages/ansible
executable location = /usr/local/bin/ansible
python version = 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0]
### Steps to Reproduce
Install rke2.
Install prmetheus-stack chart
Check targets for monitoring in prometheus
### Expected Results
All services are reachable
### Actual Results
```console
can't get targets Etcd, Scheduler, Controller-manager
From master node I can get metrics via:
curl -L http://localhost:2381/metrics | grep -v debugging
HI @aaktaev i don't think that the issue you described is a bug of this Ansible Role. It looks more like a configuration issue. Scheduler nor control-manager does not expose their endpoints by default. Endpoint of etcd in RKE2 is also not exposed and you need to allow it by special option for rke2-server (i do not remember the name of that option but you can find int it RKE2 documentation).
Btw if you want to change some default rke2-server option which is not represented in this Role as a separate variable, you can use rke2_server_options variable instead.
@aaktaev the problem is that the components you mention are only accessible via 127.0.0.1/locahost for security reasons. You probably want to use the rancher monitoring helm chart ... https://docs.ranchermanager.rancher.io/integrations-in-rancher/monitoring-and-alerting/how-monitoring-works
@jgoelen , Hi,
I'm wondering how can I expose f.e. kube-proxy to be able get metrics by Prometheus? (without using push proxy)
@aaktaev FYI: https://repo1.dso.mil/platform-one/big-bang/bigbang/-/issues/148 .... Personally I would not recommend the proposed solutions :-) The services our bound to localhost for security reasons!
hi @jgoelen,
Do you have the example of values.yaml of rancher monitoring's helm chart? It's hard for me to find any document about kube-prometheus-stack's distro.
@hoangphuocbk we don't have a satisfying solution (yet 😁) for getting alerts for ETCD, kube-proxy, etc. Therefore we temporarily disabled the alerts.
We use the following Helm chart config:
Chart.yaml
apiVersion: v2
appVersion: "1.0"
description: A Monitoring Helm chart for RKE2
name: monitoring
version: 0.1.0
dependencies:
- name: kube-prometheus-stack
version: "41.5.1"
repository: "https://prometheus-community.github.io/helm-charts"
values.yaml
kube-prometheus-stack:
namespaceOverride: monitoring
alertmanager:
config:
global:
resolve_timeout: 5m
slack_api_url: https://hooks.slack.com/services/******
inhibit_rules:
- source_matchers:
- 'severity = critical'
target_matchers:
- 'severity =~ warning|info'
equal:
- 'namespace'
- 'alertname'
- source_matchers:
- 'severity = warning'
target_matchers:
- 'severity = info'
equal:
- 'namespace'
- 'alertname'
- source_matchers:
- 'alertname = InfoInhibitor'
target_matchers:
- 'severity = info'
equal:
- 'namespace'
route:
group_by: ['namespace']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'null'
routes:
- receiver: 'null'
matchers:
- alertname =~ "InfoInhibitor|Watchdog"
- match:
receiver: 'slack'
continue: true
receivers:
- name: 'null'
- name: 'slack'
slack_configs:
- channel: '#dev-cluster-alerts'
send_resolved: false
title: 'DEV cluster [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] Alert'
text: >-
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
templates:
- '/etc/alertmanager/config/*.tmpl'
grafana:
namespaceOverride: monitoring
admin:
existingSecret: grafana-admin-secrets
ingress:
enabled: true
hosts:
- monitoring-dev.myorg.be
tls:
- hosts:
- monitoring-dev.myorg.be
prometheus-node-exporter:
namespaceOverride: monitoring
tolerations:
- effect: NoSchedule
key: dedicated
operator: Exists
- effect: NoExecute
key: CriticalAddonsOnly
operator: Exists
kube-state-metrics:
namespaceOverride: monitoring
#
# The following monitoring features are tmp disabled:
#
defaultRules:
rules:
etcd: false
kubeControllerManager: false
kubeSchedulerAlerting: false
kubeSchedulerRecording: false
kubeProxy: false
kubeEtcd:
enabled: false
service:
enabled: false
kubeScheduler:
enabled: false
service:
enabled: false
kubeControllerManager:
enabled: false
service:
enabled: false
kubeProxy:
enabled: false
service:
enabled: false
@aaktaev, you can add these options
rke2_server_options:
- "kube-controller-manager-arg: ['bind-address=0.0.0.0']"
- "kube-scheduler-arg: ['bind-address=0.0.0.0']"
- "kube-proxy-arg: ['metrics-bind-address=0.0.0.0:10249']"
- "etcd-expose-metrics: true"
Thanks,a lready did it via:
rke2_server_options:
- "kube-proxy-arg: --metrics-bind-address=0.0.0.0"
# (Optional) Additional RKE2 agent configuration options
# You could find the flags at https://docs.rke2.io/install/install_options/install_options/#configuring-linux-rke2-agent-nodes
# rke2_agent_options:
# - "option: value"
rke2_agent_options:
- "kube-proxy-arg: --metrics-bind-address=0.0.0.0"