[Bug]: kubernetes authentication method failing on AWS EKS
tstraley opened this issue ยท comments
Bug Description
Config includes:
authentication:
methods:
kubernetes:
enabled: true
required: true
When attempting to make API call using a service account token to retrieve a flipt client token, it eventually returns an empty reply:
curl http://10.0.10.105:8080/auth/v1/method/kubernetes/serviceaccount --data "{\"service_account_token\":\"$token\"}" -v -H "Content-Type: application/json"
* Trying 10.0.10.105:8080...
* Connected to 10.0.10.105 (10.0.10.105) port 8080 (#0)
> POST /auth/v1/method/kubernetes/serviceaccount HTTP/1.1
> Host: 10.0.10.105:8080
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 1060
>
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server
And the flipt service records this error:
{"L":"ERROR","T":"2024-04-03T22:42:33Z","M":"finished unary call with code Internal","server":"grpc","grpc.start_time":"2024-04-03T22:40:23Z","system":"grpc","span.kind":"server","grpc.service":"flipt.auth.AuthenticationMethodKubernetesService","grpc.method":"VerifyServiceAccount","peer.address":"127.0.0.1:53260","error":"rpc error: code = Internal desc = verifying service account: failed to verify signature: fetching keys oidc: get keys failed Get \"https://ip-172-16-173-141.ec2.internal:443/openid/v1/jwks\": dial tcp 172.16.173.141:443: connect: connection timed out","grpc.code":"Internal","grpc.time_ms":130569.35}
It looks like there is an understanding in the code that the URL provided by the oidc well-known endpoint (in this case returning that ip-172-16-173-141.ec2.internal
address) is not correct and the provided discovery URL (in this case the default kubernetes.default.svc.local
) should be used instead
Despite this attempt to handle this, the underlying oidc provider is calling the invalid URL which flipt cannot reach as part of this updateKeys call https://github.com/coreos/go-oidc/blob/22dfdcabd450013b4d51ac15b6423f529d957e9f/oidc/jwks.go#L230 which is in the 'Verify' codepath.
Version Info
v1.39.2
Search
- I searched for other open and closed issues before opening this
Steps to Reproduce
- Deploy flipt into kubernetes with helm chart as documented in https://docs.flipt.io/self-hosted/kubernetes
- Configure kubernetes auth method as documented in https://docs.flipt.io/configuration/authentication#kubernetes
- Attempt to use the kubernetes auth method to get a client token as documented in https://docs.flipt.io/authentication/methods#via-the-api
Expected Behavior
Expected to be able to get a client token response as detailed in https://docs.flipt.io/authentication/methods#via-the-api
Additional Context
Running in AWS EKS
Thanks for raising this @tstraley ! It has been a while since I implemented this, so taking me a hot minute to rebuild my context ๐ bare with me on this one.
Just adding a bunch of context off the top of my head:
If I remember correctly, the bit we have to workaround (the issuer mismatch) is simply that we instruct the go-oidc
library to not return an error when the issuer described by the discovery endpoint does not match the URL we used to request that document. This is where the oidc.InsecureIssuerURLContext
comes in:
https://pkg.go.dev/github.com/coreos/go-oidc/v3/oidc#InsecureIssuerURLContext
The go-oidc library will return an error after it gets the discovery well-known endpoint if the host used to fetch that does not match the issuer URL in the response. We use the local k8s DNS address to get the discovery document, but it returns a JWKS URL and issuer that does not match that same local k8s DNS name.
However, the go-oidc library will still use JWKs, which is what we're seeing here I believe.
I think this is a form of this issue: aws/containers-roadmap#2234
And it seems related to how EKS it set up to distribute service accounts for IAM roles via its own OIDC provider:
https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html#irsa-oidc-background
This issue linked suggests a work-around for EKS is to actually start the discovery journey using the custom EKS OIDC address (e.g. https://oidc.eks.<region>.amazonaws.com/id/<cluster-id>/.well-known/openid-configuration
).
You can currently change this in your Flipt configuration like so:
authentication:
methods:
kubernetes:
enabled: true
discovery_url: "https://oidc.eks.<region>.amazonaws.com/id/<cluster-id>"
required: true
Could you give this a try for us ๐
If this works we can make a docs update to explain this edge case a bit better ๐ฏ
This could be related to aws/containers-roadmap#2234
Haha great timing ๐
Thanks @GeorgeMac -- this makes a lot of sense.
I tried out your suggestion. First attempt was causing pod to crash on startup, but was eventually able to get relevant error (there were some red-herring "context closed" errors on a couple pod restarts masking this one):
Error: configuring kubernetes authentication: fetching OIDC configuration: Get "https://oidc.eks.us-east-1.amazonaws.com/id/<our cluster id>/.well-known/openid-configuration": tls: failed to verify certificate: x509: certificate signed by unknown authority
This endpoint is one of AWS's public endpoints and uses a cert signed by CN=Amazon RSA 2048 M02
. I could probably get this specific CA, put it in a k8s config map, and mount the volume; but to get by for now I added /etc/ssl/certs/ca-certificates.crt
as the ca path, since this OS CA bundle is in the flipt container image.
authentication:
methods:
kubernetes:
enabled: true
discovery_url: "https://oidc.eks.<region>.amazonaws.com/id/<cluster-id>"
ca_path: "/etc/ssl/certs/ca-certificates.crt"
required: true
This started up fine, and now appears to be working properly!
Please feel free to resolve and update docs as you see fit. Thanks for the help!
That's amazing, thanks for raising this and working through it!
I will open a docs issue before closing this, so we make sure to get these details in there for future folks.