fluxcd / notification-controller

The GitOps Toolkit event forwarded and notification dispatcher

Home Page:https://fluxcd.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Alerts to Grafana when drift detected: "error":"postMessage failed: failed to execute request: context deadline exceeded"

lgarciaharo opened this issue · comments

We are using tfcontroller and trying to send alerts to Grafana via Notification-controller from flux according to the documentation.

The configuration to make possible sending alerts with flux is already applied. Integrate with Flux Receivers and Alerts.
We have also a webhook provider and the Dritf detection works properly and the alert in that webhook is received. So, I suppose the event format emitted by tfcontroller is correct for flux.

On the Grafana side, the API Token is already stored as secret in the cluster and is tested. We can create annotations in grafana with curl to our https://<grafana-url>/api/annotations endpoint.

However, the grafana provider and its alert aren't working.
We have enabled debug logs and in the notification-controller and we can see:

{
    "alert": {
        "name": "tf-alert-grafana",
        "namespace": "ba999"
    },
    "error": "postMessage failed: failed to execute request: context deadline exceeded",
    "eventInvolvedObject": {
        "apiVersion": "infra.contrib.fluxcd.io/v1alpha2",
        "kind": "Terraform",
        "name": "my-tf",
        "namespace": "ba999",
        "resourceVersion": "13782667",
        "uid": "8fd840ac-4bb5-4a02-8872-0df8f3e66d56"
    },
    "level": "error",
    "logger": "event-server",
    "msg": "failed to send notification",
    "stacktrace": "github.com/fluxcd/notification-controller/internal/server.(*EventServer).handleEvent.func1.1\n\tgithub.com/fluxcd/notification-controller/internal/server/event_handlers.go:248",
    "ts": "2023-09-27T16:02:30.646Z"
}

The versions are:

Flux Argo CD Image
v2.0.1 v2.7 v2.7.10

I've created an issue in tfcontroller too but as the event emitted by tfcontroller is working fine for webhook provider I'm not sure where is the problem if in the tfcontroller or in the flux side.

Any idea about what is happening?

Any help is welcome :)

Thanks!

Hello, Are you sure your Grafana API is reachable from within the cluster?

Hi @lgarciaharo ,

we had a similar issue today were the notification controller was not able to call Grafana at all. No HTTP packets were sent or received to Grafana when an event was processed and we also received the "error": "postMessage failed: failed to execute request: context deadline exceeded", error message.

In our case the error was that we had a trailing line feed \n at the end of our Grafana service token stored in our Kubernetes secret.

This happened because we encoded it with:

echo "$token" | base64

on the command line instead of using:

echo -n "$token" | base64

Maybe this helps you finding your problem.

Best regards,
Florian.

Thanks @fbuchmeier-abi I will try it next time I touch the code and let you know.