[FEATURE] add metrics e.g. for Prometheus/Grafana

Question

[FEATURE] add metrics e.g. for Prometheus/Grafana

schenklklopfer opened this issue 2 months ago · comments

Is your feature request related to a problem? Please describe. 🐛
Before using your fancy tool I used fbonalair/traefik-crowdsec-bouncer or the newer thespad/traefik-crowdsec-bouncer.
Those had at least one metrics endpoint: crowdsec_traefik_bouncer_processed_ip_total
But the more interesting infromation - like how many reqeusts has been blocked/passed and maybe the ratio were not there.
I used to parse the logoutout with complicated LogQL quieries in Loki.

The only way to do this here is to set logLevel ot DEBUG and rewrite the LogQL queries to get the infromation from the debug-log.
But I am afraid the DEBUG-loglevel might affect performance of the plugin, so maybe this is not the best way.

Describe the solution you'd like ✨
A way to get standarized metrics from this plugin maybe in a standard format like the Prometheus Metrics

I can imagine of metrics like:

crowdsec_traefik_bouncer_plugin_requests_blocked
crowdsec_traefik_bouncer_plugin_requests_passed
crowdsec_traefik_bouncer_plugin_ips_currently_blocked
crowdsec_traefik_bouncer_plugin_ips_ever_blocked (since startup)
crowdsec_traefik_bouncer_plugin_crowdsecMode
crowdsec_traefik_bouncer_plugin_iscrowdsecstreamhealthy
crowdsec_traefik_bouncer_plugin_updatefailure_count

If used:

crowdsec_traefik_bouncer_plugin_redis_stats

Additional context
To visually realize dashbaords in e.g. Grafana to see how much the system is protecting the systems.
Like this:

mathieuHa · Answer 1 · Fri Jun 07 2024 00:47:20 GMT+0800 (China Standard Time)

Hi @schenklklopfer,

We've thought about this but we believe it is not the best place to write an exporter in a plugin.
We'd like to have thoses metrics of course, but a plugin, like a middleware is made to take a query, do some stuff with it (in our case block, captcha) and/or let the query continue.

Dev an exporter would mean to have persistent storage for the stats and lots of write, updates in the cache (memory or redis).
I believe it could impact perf as well as using debug mode
I would also mean we would have to intercept the request, check the path for a /metrics endpoint, read the cache, format it conditionaly for each request which would made the core code more complexe than it already is.

Crowdsec provide native metrics https://docs.crowdsec.net/docs/observability/prometheus. They are metrics from the LAPI, and parsing logs but it contains the number of action taken for instance. Even without debug logs and with the help of access logs you can count the number of 403 returned by IP.

In my opinion, once you've banned an IP for like 4 hours for a web scan for instance, no matter if it tries 1, 10 or thousands of requests if they are blocked.

I will let this issue open, to see if there is more people that share this need, and we'll advise then