traefik-plugins / traefik-jwt-plugin

Traefik plugin which checks JWT tokens for required fields. Supports Open Policy Agent (OPA) and signature validation with JWKS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Plugin fetches jwkEndpoints a lot of times over a short period of time

cmartell-at-ocp opened this issue · comments

Hello,

We're running into an issue where the plugin will start fetching jwkEndpoints a lot, as opposed to every 15 minutes as I'd expect based on the source code in jwt.go.

I initially discovered this through tracing, and confirmed it by enabling DEBUG level logs in Traefik.

This is how we are configuring the Middleware (headers and namespace cleaned for simplicity):

spec:
  plugin:
    jwt:
      JwtHeaders:
        x-head-aud: aud
        x-head-exp: exp
        x-head-sub: sub
      Keys:
        - http://http.<namespace>/oidc/v1/jwks
      PayloadFields:
        - sub
        - exp
        - aud
      Required: true

Checking the Debug log, I noticed "time" is zeroed, and ip, port, url and sub are empty. I wonder if the plugin doesn't really support internal services as key provicers?

Note that everything else seems to work fine with the Middleware, but the constant fetching of the jwks seems to get worse the longer we let the traefik pod run. Once we restart the pod, it seems to go back down to every 15 minutes for a little while before it starts querying jwks non-stop.

traefik.log

Hello
As a maintainer I am not using this functionality.
Are you willing to provide a PR or an investigation ?

Willing to assist in whichever way I can.

This is the part of the code that I'd expect will keep the keys "cached" for 15 minutes: https://github.com/traefik-plugins/traefik-jwt-plugin/blob/main/jwt.go#L198-L203

But I'm not sure how Traefik handle plugins. Does it load a copy of the plugin every new request? Is the plugin reused across requests? I wonder if it is simply a matter of old running plugins not being "cleaned up" after a request is completed.

Willing to assist in whichever way I can.

This is the part of the code that I'd expect will keep the keys "cached" for 15 minutes: https://github.com/traefik-plugins/traefik-jwt-plugin/blob/main/jwt.go#L198-L203

But I'm not sure how Traefik handle plugins. Does it load a copy of the plugin every new request? Is the plugin reused across requests? I wonder if it is simply a matter of old running plugins not being "cleaned up" after a request is completed.

it may be the case. I've been fixing one cashing issue in the other plugin.
Since traefik docs doesn't supply proper explanation - it's only matter of experiments.

are you able to research this?

Hi

We ran into a similar situation. We use a lot of dynamic configurations (file provider) for each of our customers. We deploy them all at the same time.
This is from our daily access.log:

[root@proxy logs]# grep "/certs" access.log | jq .'RequestPath' | cut -d'/' -f3 | sort | uniq -c
   8021 customer1-prod
  20358 customer2-prod
  20358 customer3-prod
  20360 customer4-prod
  20358 customer5-prod
  ...
  20359 customerX-prod
  123 other-service

These are requests to the URL configured in the key section of the plugin (see below), grouped by customer name. Interestingly, customer1-prod is one of our newest customers.

We suspect that this high number of requests could be caused by the dynamic files, spawning a new BackgroundRefresh() routine every time the config changes/reloads. The other-service does not get deployed that often, therefore it's count is smaller. We do not restart Traefik after our deployments.

Current middleware config:

http:
  middlewares:
    # Validate access token using Traefik Plugin.
    web-token:
      plugin:
        token:
          PayloadFields: ["exp", "sub", "sid", "iss"]
          Required: true
          JwtCookieKey: token
          JwtQueryKey: token
          Keys:
            - https://example.com/realms/customer1-prod/protocol/openid-connect/certs

Mabye this helps.

Nice findings @spezifanta.

From my end, what I noticed is that Traefik 3.0.0 (beta and rc versions) spam this a lot more than Traefik 2.x. We are only using Traefik 3 in one of our clusters vs other clusters, so could also be that something changed between v2 and v3 in terms of how plugins / middleware is used.

We did a bit of more diggging and forked the plugin.

At the moment we are pretty sure, that go routinges are not garbe collected when reloading a dynamic config.

Basically, we added an unique identifer every time BackgroundRefresh() is called to see whats going on.

func (jwtPlugin *JwtPlugin) BackgroundRefresh(context context.Context, identifier string) {
	for {
		logInfo(fmt.Sprintf("Started BackgroundRefresh %s",identifier)).print()
		select {
		case <-context.Done():
                        # This gets never called?!
			logInfo(fmt.Sprintf("Ended BackgroundRefresh %s",identifier)).print()
			return
		default:
			logInfo(fmt.Sprintf("Do BackgroundRefresh %s",identifier)).print()
			jwtPlugin.FetchKeys()
			time.Sleep(15 * time.Minute) // 15 min
		}
	}
}

This is the output after starting Traefik and changing a dynamic config two times.

**Created** BackgroundRefresh 8784B9863A8DB17BC71EB3F0686CE46E 
Started BackgroundRefresh 8784B9863A8DB17BC71EB3F0686CE46E 
Do BackgroundRefresh 8784B9863A8DB17BC71EB3F0686CE46E      
**Created** BackgroundRefresh CD204DA243721FF35D231DAF2E7EA5C2 
Started BackgroundRefresh CD204DA243721FF35D231DAF2E7EA5C2 
Do BackgroundRefresh CD204DA243721FF35D231DAF2E7EA5C2      
**Created** BackgroundRefresh 1732504C1B0750675E36DB70EE586821 
Started BackgroundRefresh 1732504C1B0750675E36DB70EE586821 
Do BackgroundRefresh 1732504C1B0750675E36DB70EE586821      
Started BackgroundRefresh 8784B9863A8DB17BC71EB3F0686CE46E 
Do BackgroundRefresh 8784B9863A8DB17BC71EB3F0686CE46E      
Started BackgroundRefresh CD204DA243721FF35D231DAF2E7EA5C2 
Do BackgroundRefresh CD204DA243721FF35D231DAF2E7EA5C2      
Started BackgroundRefresh 1732504C1B0750675E36DB70EE586821 
Do BackgroundRefresh 1732504C1B0750675E36DB70EE586821      
Started BackgroundRefresh 8784B9863A8DB17BC71EB3F0686CE46E 
Do BackgroundRefresh 8784B9863A8DB17BC71EB3F0686CE46E      
Started BackgroundRefresh CD204DA243721FF35D231DAF2E7EA5C2

You end up with three different BackgroundRefresh() jobs. This should not happen.
We have not figgured out how to cleanup this routine.

Looks like we are not the only one

So does it make sense to introduce private module\package level variable for keeping goroutine context, so on the plugin start it can be cancelled if exists.

Or maybe leveraging some existing cron library and also keep the handler as a module\package private level variable so it can be cancelled \ rescheduled in on the plugin start.

Since i couldn't get anything to work with runtime.SetFinalizer(). I drafted a PR (#68) that will (at least in our scenario) mitigate this issue greatly. Hope it helps. Maybe someone can come up with a more complete solution for this issue. But the Traefik project seems unwilling to help with this issue from their side so far.