opsgenie / kubernetes-event-exporter

Export Kubernetes events to multiple destinations with routing and filtering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing events

cristifalcas opened this issue · comments

Latest version is not catching most events.
Compared with version 0.10, nothing is showing up in my logs now

Hi,
Have you modified your configuration to increase the throttlePeriod to catch your missing events ?

If it's done, please give us the value you set and an example of missing event in Json.

I set throttlePeriod from 5 to 300 and nothing changed.

Issues:

  1. pkg/kube/watcher.go needs to import _ "k8s.io/client-go/plugin/pkg/client/auth/gcp" in order to run it locally
  2. It throws a lot of warnings about client-side throttling. I put this in main.go to fix it:
+       kubeconfig.QPS = 1e6
+       kubeconfig.Burst = 1e6
  1. When the config is missing the namespace, seems it drops most of the events. Very few are printed. Hardcoding the namespace is showing a lot more events, but unfortunately only for that namespace
  2. It doesn't receive events from custom sources (kube-downscaler).

Config:

logLevel: debug
#logFormat: json
throttlePeriod: 5
# namespace: "custom-ns"
leaderElection:
  enabled: False
route:
  routes:
    - match:
        - receiver: "stdout"
receivers:
  - name: "stdout"
    stdout: {}

I've run some manual tests in minikube cluster before I released this feature, but that small volume probably did not amount to something. Will test it more to find out what's going on.

I don't use gcp @cristifalcas does other auth mechanisms need to be imprted that way too? It seems similar how database drivers are used in Go

@mustafaakin sorry, it didn't cross my mind that this is used on other clusters as well :) . I don't know how it is outside gcp.

I found the slowness in my case. The calls for GetObject(reference, l.clientset, l.dynClient) break everything. I think they return too slow? Returning nil, nil in GetLabelsWithCache and GetAnnotationsWithCache fixed all my issues.

I have to admit that there are hundreds of events per second in my cluster.
Maybe when it watches for all namespaces it gets overwhelmed when it tries to get the labels and annotations?

ETA: This was restored when I changed the throttlePeriod value 🙇


This is also the case in my environment. After upgrading to v0.11, event-exporter can no longer track events at all.

logLevel: error
logFormat: json
route:
  routes:
  - drop:
    - type: Normal
    match:
    - receiver: dump
      kind: Pod
receivers:
- name: dump
  stdout: {}

Most likely not related to the original issue, but I also spotted missing events: #163

Also seeing issues with throttling in 0.11:

I0103 11:46:44.912506       1 request.go:665] Waited for 1.157283329s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/apiregistration.k8s.io/v1beta1?timeout=32s

and fewer events. Issue with no events when upgrading was fixed by not setting layout when using json as logFormat - my config looks like this:

config.yaml: |
  leaderElection: {}
  logFormat: json
  logLevel: info
  receivers:
  - file:
      path: /dev/stdout
    name: dump
  route:
    routes:
    - match:
      - receiver: dump
commented

Having similar issues with similar configurations, using throttle periods anywhere between 1 and 300. Using both stdout and elasticsearch sinks shows similar (if not exactly the same) number of events (e.g. ~270 in the last hour), but nowhere near the actual number of events (~1700).
Watching the logs and events simultaneously, it seems that it occurs most often when there is a larger number of events happening concurrently/in rapid succession. Current configuration:

  config.yaml: |
    leaderElection: {}
    logFormat: json
    logLevel: info
    receivers:
    - elasticsearch:
        deDot: true
        hosts:
        - https://elasticsearch:9200
        indexFormat: events-{2006-01-02}
        tls:
          insecureSkipVerify: true
      name: elasticsearch
    - name: stdout
      stdout:
        layout:
          customEvent: |
            {{ toJson . }}
    route:
      routes:
      - match:
        - receiver: stdout
        - receiver: elasticsearch
    throttlePeriod: 10

@ncgee I am also facing the same issue where difference in the count of events coming and events getting sinked is too high.
100s of events are coming per second and kube-events-exporter is also showing them in logs but "sink" events are very low.

Did you get any solution for that?

I have described everything here #192

@omauger Can you please help here?