onedr0p / home-ops

Wife approved HomeOps driven by Kubernetes and GitOps using Flux

Home Page:https://onedr0p.github.io/home-ops/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Write a script/task to import alertmanager silences upon cluster rebuild

onedr0p opened this issue · comments

I have quite a few silences that need to be set when I rebuild my cluster:

alertname="CephNodeInconsistentMTU"
alertname="CephNodeNetworkPacketErrors"
alertname="CephMonClockSkew"
alertname="CephNodeNetworkPacketDrops"
alertname="CephNodeDiskspaceWarning",device="/dev/sda6"
alertname="CephNodeDiskspaceWarning",mounpoint="/etc/nfsmount.conf"

It would be nice to have an import script to silence these after a fresh cluster is installed

This can be done with curl e.g.

curl https://alertmanager.devbu.io/api/v2/silences -H "Content-Type: application/json" -d '{
  "matchers": [
    {
      "name": "alertname",
      "value": "CephNodeDiskspaceWarning",
      "isRegex": false
    },
    {
      "name": "device",
      "value": "/dev/sda2",
      "isRegex": false
    }
  ],
  "startsAt": "2000-01-01T00:00:00.000Z",
  "endsAt": "2100-01-01T00:00:00.000Z",
  "createdBy": "api",
  "comment": "Imported Silence",
  "status": {
    "state": "active"
  }
}'

This could also be done with amtool ...

amtool --alertmanager.url https://alertmanager.devbu.io silence query alertname="CPUThrottlingHigh" namespace="rook-ceph" --output json

Looks like migrating to Talos solved this issue, I haven't had to silence any of these alerts.

Too soon, they all started happening again

Closing in favor of #7021