maxlerebourg / crowdsec-bouncer-traefik-plugin

Traefik plugin for Crowdsec - WAF and IP protection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEATURE] Allow users to specify plugin behavior on cache refreshing failure

darkweaver87 opened this issue · comments

Is your feature request related to a problem? Please describe. 🐛
We implemented a PoC with your wonderful plugin and we would like to put it in production but we still have one remaining issue using stream mode (but using other mode don't change anything).

Crowdsec free deployment relies on some agents sending their decisions to a local API.
This LAPI can't be scaled by design as this will mean agents will potentially try to send their data to an LAPI they are not registered on.

Consequently, this means that's technically speaking we can "lose" the LAPI for a given amount of time and it can be unavailable during the cache refresh. If it's the case then Traefik returns a 403.

Even if I tend to agree that it's a good security practice to block when their is a doubt on some services that's not really ideal. In my case, I need to allow users to access the service on such a failure.

Describe the solution you'd like

Thus, I was thinking about either:

  • allow users to specify the behavior they want when the refresh fails
  • if the refresh fails, keep the last known state for a given grace period

I will be happy to contribute, just let me know your thoughts on this :-)

Additional context

  • crowdsec version: v1.6.0
  • crowdsec plugin version: v1.2.1
  • traefik version: v2.11.2

Hi,

Thanks for the interest in the plugin, we're discussing the issue you encountered with @maxlerebourg.

l280 bouncer.go

        // Right here if we cannot join the stream we forbid the request to go on.
	if bouncer.crowdsecMode == configuration.StreamMode || bouncer.crowdsecMode == configuration.AloneMode {
		if isCrowdsecStreamHealthy {
			handleNextServeHTTP(bouncer, remoteIP, rw, req)
		} else {
			bouncer.log.Debug(fmt.Sprintf("ServeHTTP isCrowdsecStreamHealthy:false ip:%s", remoteIP))
			handleBanServeHTTP(bouncer, rw)
		}
	} 

I'm thinking about an internal counter, that allows X number of time the stream to be unhealthy before going to 403 requests.
So the updateInterval multiplied by the counter, would allow that grace period.

With some default variable exemple:
streamUnhealthyMaxTime=3
UpdateIntervalSeconds=60

So instead of blocking at 1 min if the LAPI is unreacheable, it would be blocked after 3 min.
A successfull sync with the LAPI would reset that counter

Hello,

Thank you for your feedback :-)
Looks good to me :-)

Thanks 👍

Rémi

Hi,

We're almost done implementing it, I have tested basic behavior yesterday:

  • never block if UpdateMaxFailure=-1
  • block after first fail if UpdateMaxFailure=0 (default)
  • block after 10 failed attempt if UpdateMaxFailure=10
  • unblock when successful attempt and reset counter

We should merge and release a beta version very soon.