nginxinc / nginx-s3-gateway

NGINX S3 Caching Gateway

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Assume role with web identity not working (IRSA on AWS EKS)

westernspion opened this issue · comments

Describe the bug
Assume role with web identity does not work as an authorization scheme. While configuring to leverage web identity (via IRSA on EKS) I get the following error from Nginx when trying to complete the assume role

2023/05/24 02:38:30 [error] 79#79: *1 js http fetch SSL certificate verify error: (20:unable to get local issuer certificate) while SSL handshaking to fetch target, client: 127.0.0.1, server: , request: "HEAD /my-file.txt HTTP/1.1", subrequest: "/aws/credentials/retrieve", host: "localhost:61465"
2023/05/24 02:38:30 [info] 79#79: *1 js: Could not assume role using web identity: {}
2023/05/24 02:38:30 [error] 79#79: *1 auth request unexpected status: 500 while SSL handshaking to fetch target, client: 127.0.0.1, server: , request: "HEAD /my-file.txt HTTP/1.1", host: "localhost:61465"

I deliberately logged out the API call to STS based on the nginx javascript being called to do this and ran it through curl inside the same container and it works just fine. This leads me to believe Nginx is not properly configured related to SSL - but I can't imagine what.

To Reproduce
Configure a pod on eks to run the nginx-s3-gateway with a correctly configured IRSA role, like so

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-s3-gateway
  labels:
    app: nginx-s3-gateway
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-s3-gateway
  template:
    metadata:
      labels:
        app: nginx-s3-gateway
    spec: 
      serviceAccount: my-irsa-enabled-service-account
      containers:
      - name: nginx
        image: my-nginx-s3-gateway-image:latest
        ports:
        - containerPort: 80
        env:
          - name: S3_BUCKET_NAME
            value: my-bucket
          - name: S3_REGION
            value: us-east-1
          - name: S3_SERVER_PROTO
            value: https
          - name: AWS_SIGS_VERSION
            value: "4"
          - name: S3_SERVER
            value: s3.us-east-1.amazonaws.com
          - name: S3_SERVER_PORT
            value: "443"
          - name: S3_STYLE
            value: path
          - name: DEBUG
            value: "true"
          - name: AWS_ROLE_SESSION_NAME
            value: nginx-s3-proxy

Attempt to reach a file on the backend, perhaps by forwarding the pod to your local machine and call it with curl
curl -XGET http://localhost:<some-port>/my-file.txt

Expected behavior
Assume role should be working - with supplied configuration (unless I am missing something).

Your environment

  • AWS EKS v1.26
  • Nginx s3 gateway OSS build - most recent commit of 1675b5a

Thanks for taking a look!

I have had the same error but the issue was in the IAM role!

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::{account id}:oidc-provider/oidc.eks.{region}.amazonaws.com/id/{your oidc id}"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.{region}.amazonaws.com/id/{your oidc id}:aud": "sts.amazonaws.com",
                    "oidc.eks.{region}.amazonaws.com/id/{your oidc id}:sub": "system:serviceaccount:{namespace}:{application name}"
                }
            }
        }
    ]
}

In this case, the assume role and irsa are all working as expected. I can hit STS from the nginx pod with the assigned web identity token and get creds. This particular issue is not with my irsa config.

Can you provide any additional environment variables you have set in this environment if there are more set than specified in the above spec file? Please redact sensitive information.

I actually sorted this out - uncommenting the js_fetch_trusted_certificate did in fact do the trick. In my frustrated delirium, I unset the pull policy from always, which meant my node was pulling old image in my above testing w/o the certificate enabled.

But i came across another small bug - i will list shortly.