vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes

Home Page:https://velero.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot load AWS token file when using AWS IAM-backed service accounts

geofffranks opened this issue · comments

What steps did you take and what happened:
We deployed velero v1.2.0-beta1 in an attempt to use AWS IAM backed Service Accounts in EKS, as described in #1965. When velero started, it failed with the following error:

time="2019-10-28T19:46:05Z" level=info msg="Checking that all backup storage locations are valid" logSource="pkg/cmd/server/server.go:421"
An error occurred: some backup storage locations are invalid: error getting backup store for location "default": rpc error: code = Unknown desc = WebIdentityErr: unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token
caused by: open /var/run/secrets/eks.amazonaws.com/serviceaccount/token: permission denied

What did you expect to happen:

Velero to start up and work

The output of the following commands will help us better understand what's going on:

  • kubectl logs deployment/velero -n velero
time="2019-10-28T19:46:05Z" level=info msg="Checking that all backup storage locations are valid" logSource="pkg/cmd/server/server.go:421"
An error occurred: some backup storage locations are invalid: error getting backup store for location "default": rpc error: code = Unknown desc = WebIdentityErr: unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token
caused by: open /var/run/secrets/eks.amazonaws.com/serviceaccount/token: permission denied

Anything else you would like to add:

This looks similar to the issue described here: kubernetes-sigs/external-dns#1185, so I applied the fix to our velero deployment yaml, and that resolved the issue. Is this something that can be added to the velero cli's auto-generated deployment yaml?

securityContext:
        fsGroup: 65534

I can confirm this. Adding the securityContext "fixes" the issue. @geofffranks thanks for pointing to this.

thanks for reporting @geofffranks -- will take a more detailed look and decide how to proceed.

Transferring this to the AWS plugin repo. I think for now we probably want to just document this for AWS users using this setup.

Confirming as of today, 2/12/2020, this issue still exists, and the fix, referenced above by @geofffranks, still works.

Issue still exists 3/25/2020, fix referenced above still works for resolving the listed error.

However, depending on networking configuration, there can be an additional error where velero cannot reach the sts.amazonaws.com endpoint which prevents use of the AWS IAM-backed service accounts. This would be fixed by using a newer version of the aws-sdk though since additional environment variables become available to configure the STS endpoint in v1.25.18.

Are there any plans to update the plugin to use a newer version of the aws-sdk?

Confirmed that this issue still exists.

The fix suggested by @geoffranks still works.
The fix also works if you are installing via the Helm chart.

Update:

As per @zubron on a different ticket related to this:

The core issue seems to be that all containers for Velero run as user nobody and the service account token is mounted with permissions 0600 preventing non-root users from reading the file (see kubernetes/kubernetes#82573). This issue has been resolved in Kubernetes and looks like it was released in v1.19.0. I don't know how that fix will be made available in EKS or whether there is more to do on the Velero side.

Action needed: document the workaround and also the fact that it is addressed on k8s v1.19.0.

This would probably go under limitations in the AWS Plugin readme.