linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.

Home Page:https://linkerd.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Out of the box alerts

grampelberg opened this issue · comments

What problem are you trying to solve?

As the control plane ships with prometheus, it would be possible to include some out of the box alerts (just like the Grafana dashboards). These can be both per-service and per-route for all the golden metrics. The control plane can ship with the alerts configured and all a user would need to do is configure their preferred channel.

How should the problem be solved?

  • Alertmanager as an install option.
  • Default alerting policy for the control plane.
  • Simple configuration of alert channels (slack, email).
  • Integration with ServiceProfiles to provide alerting rules for Alertmanager.

This would be really helpful.

Grafana alerts are a pain to setup right now on linkerd2, since they do not support queries that have variables on them.

see: grafana/grafana#6557

@JCMais why are you using grafana alerts instead of prometheus/alertmanager?

Grafana allows to send the graph image at the time of the alert as attachment, and so I was trying to get it working.

commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@grampelberg would love to work on this issue as a part of gsoc this year, just had a small question are we planning to build a web UI for configuration of channel or cli or both.

That's a great question, as of right now, the dashboard is 100% read only. We should probably maintain that for the time being as it simplifies the RBAC and security implications of the dashboard. Once the pending identity and RBAC work lands for the dashboard, it would be awesome to have the alerting config in there. From a delivery perspective, I'd personally chunk it as:

  1. Consume raw alertmanager config.
  2. Provide CLI options to assist in generating an alertmanager config for consumption by kubectl apply.
  3. Add a page in the dashboard to configure alerts.

Is there a way to create alerts/recording rules on this prometheus instance?

@base698 sure, just add alert manager and configure it to use the linkerd prometheus.

Hi @grampelberg Is this issue still counts in GSOC 2020?

@rajdas98 yup!

@grampelberg how do you configure the Prometheus instance, just edit it's configmap to hit an alertmanager you install? Will the config get overwritten during upgrades?

@base698 to configure the Linkerd prometheus, you would edit the configmap. It would be overwritten during upgrade. I would recommend using your own prometheus for all this today.

I would love to add those rules to https://awesome-prometheus-alerts.grep.to/rules as well ;)

TIL, that is super cool.

Hi
Is this project still open for GSOC?