trinodb / charts

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Random S3 permission-related issues with Trino on EKS

ajinghami opened this issue · comments

We are running Trino version 432 on a Kubernetes EKS cluster managed by AWS, using the Trino Helm chart version 0.18. We are experiencing random "Access Denied" errors when querying data on S3 buckets.

The error we receive is similar to the following:

Error running query: TrinoExternalError(type=EXTERNAL, name=HIVE_CANNOT_OPEN_SPLIT, message="Error opening Hive split s3://S3BUCKETNAME/S3FILE.parquet (offset=33554432, length=33554432): Read 49152 tail bytes of s3://S3BUCKETNAME/S3FILE.parquet failed: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 11111111111111; S3 Extended Request ID: 111111/222222222+3333/4444444444+555; Proxy: null), S3 Extended Request ID: 111111/222222222+3333/4444444444+555 (Bucket: S3BUCKETNAME, Key: S3FILE.parquet)", query_id=20240522_111111_222_3333)

This error occurs randomly and affects all queries and users simultaneously. The issue is not specific to any particular S3 bucket or table, as it affects queries for all files stored in different S3 buckets.

Workarounds and Observations

  • The same query will often succeed after 4-5 attempts.
  • If the query continues to fail, deleting and recreating the Trino pods resolves the issue temporarily.
  • This issue did not occur with our previous setup using Presto 359 on EC2.

Configuration

We have configured an IAM role with the necessary S3 permissions and assigned it to the Trino pods through a ServiceAccount annotation in the Helm chart:

serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::AWSACCOUNT:role/TRINOROLE

I guess its safe to rule out issues related to S3 API quota limits and file modifications during querying, as the files are still intact and there are no connectivity problems. I suspect the issue may be related to certain Trino pods failing to assume the required IAM role.

Any suggestions on how to ensure that the pods consistently assume the correct IAM role? Any insights or recommendations to resolve this issue would be greatly appreciated

Is it the same issue as trinodb/trino#15267?

@nineinchnick it definitely seems like we are encountering the exact same issue.
However, I don't quite understand the workaround from the thread.
Could you please provide guidance on how we can force it to always use WebIdentityTokenCredentialsProvider?

@nineinchnick This is fixed in releaase 450/ 0.25.0 chart right ?
s3.use-web-identity-token-credentials-provider=true

Yes, these are the relevant release notes: https://trino.io/docs/current/release/release-450.html#security

Let me close this as done. Let's move the discussion to trinodb/trino#15267 if you'll still be experiencing these issues with 450.