awslabs / aws-sdk-kotlin

Multiplatform AWS SDK for Kotlin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

STS failed to assume role from web identity: Unexpected response code for CONNECT: 403

jonasholtkamp opened this issue · comments

Describe the bug

Our workload is running on EKS in eu-central-1 and we have OIDC/AssumeRoleWithWebIdentity configured. We're switching from software.amazon.awssdk:s3:2.20.162 to aws.sdk.kotlin:s3:1.0.35. When trying to perform a S3 operation, e.g. getting an object, we get an exception:

STS failed to assume role from web identity: Unexpected response code for CONNECT: 403

Expected behavior

We expect the Kotlin SDK code to perform the same as the Java SDK one, reading a S3 object and authorizing by assuming a role with web identity.

Current behavior

This exception is thrown:

Exception in thread "DefaultDispatcher-worker-3" aws.smithy.kotlin.runtime.identity.IdentityProviderException: No identity could be resolved from the chain: CredentialsProviderChain -> SystemPropertyCredentialsProvider -> EnvironmentCredentialsProvider -> ProfileCredentialsProvider -> StsWebIdentityProvider -> EcsCredentialsProvider -> ImdsCredentialsProvider
...
Suppressed: aws.sdk.kotlin.runtime.auth.credentials.ProviderConfigurationException: Missing value for system property `aws.accessKeyId`
        at aws.sdk.kotlin.runtime.auth.credentials.SystemPropertyCredentialsProvider.requireProperty(SystemPropertyCredentialsProvider.kt:32)
...
Suppressed: aws.sdk.kotlin.runtime.auth.credentials.ProviderConfigurationException: Missing value for environment variable `AWS_ACCESS_KEY_ID`
        at aws.sdk.kotlin.runtime.auth.credentials.EnvironmentCredentialsProvider.requireEnv(EnvironmentCredentialsProvider.kt:32)
...
Suppressed: aws.sdk.kotlin.runtime.auth.credentials.ProviderConfigurationException: could not find source profile default
        at aws.sdk.kotlin.runtime.auth.credentials.profile.ProfileChain$Companion.resolve$aws_config(ProfileChain.kt:323)
...
Suppressed: aws.smithy.kotlin.runtime.auth.awscredentials.CredentialsProviderException: STS failed to assume role from web identity
        at aws.sdk.kotlin.runtime.auth.credentials.StsWebIdentityCredentialsProvider.resolve(StsWebIdentityCredentialsProvider.kt:139)
...
Caused by: aws.smithy.kotlin.runtime.http.HttpException: java.io.IOException: Unexpected response code for CONNECT: 403
        at aws.smithy.kotlin.runtime.http.engine.okhttp.OkHttpEngine.roundTrip(OkHttpEngine.kt:158)
...
Caused by: java.io.IOException: Unexpected response code for CONNECT: 403
        at okhttp3.internal.connection.ConnectPlan.createTunnel(ConnectPlan.kt:426)
...
Suppressed: aws.sdk.kotlin.runtime.auth.credentials.ProviderConfigurationException: Container credentials URI not set
        at aws.sdk.kotlin.runtime.auth.credentials.EcsCredentialsProvider.resolve(EcsCredentialsProvider.kt:82)
...
Suppressed: aws.smithy.kotlin.runtime.auth.awscredentials.CredentialsProviderException: failed to load instance profile
        at aws.sdk.kotlin.runtime.auth.credentials.ImdsCredentialsProvider.resolve(ImdsCredentialsProvider.kt:83)
...
Caused by: aws.sdk.kotlin.runtime.config.imds.EC2MetadataError: Request forbidden: IMDS is disabled or the caller has insufficient permissions.
        at aws.sdk.kotlin.runtime.config.imds.TokenMiddleware.getToken(TokenMiddleware.kt:79)
...
Suppressed: kotlinx.coroutines.internal.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@391ab37d, Dispatchers.Default]

I turned on debug logging:

    <logger name="aws.sdk.kotlin" level="debug"/>
    <logger name="aws.smithy.kotlin" level="debug"/>

But that doesn't yield much more:

retrieving assumed credentials via web identity
resolved endpoint: Endpoint(uri=https://sts.eu-central-1.amazonaws.com, headers=null, attributes=aws.smithy.kotlin.runtime.collections.EmptyAttributes@65c882c4)
(two more attempts)
sts refused to grant assumed role credentials from web identity

Steps to Reproduce

This code (shortened for clarity) has worked fine using OIDC/WebIdentity:

        val s3clientBuilder = S3Client.builder()
        val s3client = s3clientBuilder.region(Region.of("regionFromConfig")).build()

        return s3client.getObject(
            GetObjectRequest.builder()
                .bucket("bucketFromConfig")
                .key("keyFromConfig").build(),
        ).use { it.readAllBytes().decodeToString() }

When trying to accomplish the same with the Kotlin SDK, we get authentication/authorization errors:

        S3Client {
            region = "regionFromConfig"
        }.use { s3 ->
            val request = GetObjectRequest {
                key = "keyFromConfig"
                bucket = "bucketFromConfig"
            }

            s3.getObject(request) { res ->
                res.body?.decodeToString()
            }
        }

Possible Solution

The error message looks like a proxy issue. Indeed, a proxy is configured with HTTPS_PROXY and HTTP_PROXY environment variables. However, the Java SDK works just fine. And even when forcing a proxy, the result is the same:

        S3Client.fromEnvironment {
            // more config
            httpClient(OkHttpEngine) {
                proxySelector = ProxySelector { ProxyConfig.Http("proxy url with credentials") }
            }
        }

I can't find the failing call in Cloud Trail > Event History which supports my theory about the proxy.

Context

We haven't changed the policy for sts:AssumeRoleWithWebIdentity and the workload still performs as expected after rolling back to the Java solution. All environment variables look fine:

AWS_REGION=eu-central-1
AWS_STS_REGIONAL_ENDPOINTS=regional
AWS_ROLE_ARN=arn:aws:iam::<account ID>:role/<role name>
AWS_DEFAULT_REGION=eu-central-1 
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
===
{
  "aud": [
    "sts.amazonaws.com"
  ],
  "exp": 1705647934,
  "iat": 1705561534,
  "iss": "https://oidc.eks.eu-central-1.amazonaws.com/id/<oidc provider ID>",
  "kubernetes.io": {
    "namespace": "<namespace>,
    "pod": {
      "name": "<pod name>",
      "uid": "2f5650e0-bedd-41e4-9c5a-ce8ab0b6965a"
    },
    "serviceaccount": {
      "name": "<service account name>"
      "uid": "788d6e79-c2af-4b12-bd8a-be807153f1e5"
    }
  },
  "nbf": 1705561534,
  "sub": "system:serviceaccount:<namespace name>:<pod name>"
}

AWS Kotlin SDK version used

1.0.35

Platform (JVM/JS/Native)

JVM

Operating System and version

eclipse-temurin:17.0.8_7-jre-alpine

Hi, thanks for the detailed bug report! I am not able to replicate it using an S3-based statically hosted OIDC provider. I can make s3:GetObject, s3:ListObjects, etc. requests with credentials sourced by the StsWebIdentityCredentialsProvider.

Can you enable and share TRACE level logs? That may give more information about what's going wrong.

Also, while looking into this issue, I found a bug where the provider does not use client-configured region (region = "regionFromConfig" in your example) and instead falls back to the environment variable AWS_REGION.

It seems unrelated to your issue, but could that have been causing the problem for you? Is "regionFromConfig" different from the AWS_REGION=eu-central-1 environment variable?

Hey there, I'll enable TRACE logs and will get back to you. "regionFromConfig" and AWS_REGION are both assigned to to eu-central-1, so that can't be the issue.

This is interesting, TRACE logs have something more to say:

Attempting to resolve identity from aws.sdk.kotlin.runtime.auth.credentials.StsWebIdentityProvider@b5ed93e
retrieving assumed credentials via web identity
operation started
resolved endpoint: Endpoint(uri=https://sts.eu-central-1.amazonaws.com, headers=null, attributes=aws.smithy.kotlin.runtime.collections.EmptyAttributes@5e6218ce)
call started
proxy select start: url=https://sts.eu-central-1.amazonaws.com/
proxy select end: url=https://sts.eu-central-1.amazonaws.com/; proxies=[HTTP @ <proxy host>/<proxy ip>:<proxy port>]
starting connection: addr=/<proxy ip>:<proxy port>; proxy=HTTP @ <proxy host>/<proxy ip>:<proxy port>
connect failed: addr=/<proxy ip>:<proxy port>; proxy=HTTP @ <proxy host>/<proxy ip>:<proxy port>; protocol=null
call failed
retrying request, attempt 2
(attempt 2, same as above)
retrying request, attempt 3
(attempt 3, same as above)
operation failed
sts refused to grant assumed role credentials from web identity
unable to resolve identity from aws.sdk.kotlin.runtime.auth.credentials.StsWebIdentityProvider@b5ed93e: STS failed to assume role from web identity

<proxy host> is the proxy's hostname (as defined in HTTPS_PROXY/HTTP_PROXY), however, without the username & password portion. HTTPS_PROXY is set to http://<username>:<password>@<proxy host>:<proxy port>, however in the log I just see <proxy host>. It looks to me (I'm but an amateur!) that the proxy is selected just right, however the connection is made without the credentials.

That helps narrow down the problem a lot. Thanks! I'll take a look and get back to you.

Hi, I've set up a mitmproxy server locally with a user/pass and am unable to replicate the failure. I did notice the user info (<username>:<password>) is not logged as you've demonstrated, but it is used correctly internally.

I think we should step back and look at the raw sts:AssumeRoleWithWebIdentity request that each SDK is making, including the headers and body, this would let us see if there are any critical differences in how the SDKs are sending these requests.

Thanks for your time invested. Any idea on how I could log the sts request? Any interceptors on S3Client would only log information about the S3 request:

readBeforeExecution
GetObjectRequest(bucket=<bucket name>,checksumMode=null,expectedBucketOwner=null,ifMatch=null,ifModifiedSince=null,ifNoneMatch=null,ifUnmodifiedSince=null,key=<object key>,partNumber=null,range=null,requestPayer=null,responseCacheControl=null,responseContentDisposition=null,responseContentEncoding=null,responseContentLanguage=null,responseContentType=null,responseExpires=null,sseCustomerAlgorithm=null,sseCustomerKey=*** Sensitive Data Redacted ***,sseCustomerKeyMd5=null,versionId=null)
readBeforeSerialization
GetObjectRequest(bucket=<bucket name>,checksumMode=null,expectedBucketOwner=null,ifMatch=null,ifModifiedSince=null,ifNoneMatch=null,ifUnmodifiedSince=null,key=<object key>,partNumber=null,range=null,requestPayer=null,responseCacheControl=null,responseContentDisposition=null,responseContentEncoding=null,responseContentLanguage=null,responseContentType=null,responseExpires=null,sseCustomerAlgorithm=null,sseCustomerKey=*** Sensitive Data Redacted ***,sseCustomerKeyMd5=null,versionId=null)
readBeforeAttempt
GetObjectRequest(bucket=<bucket name>,checksumMode=null,expectedBucketOwner=null,ifMatch=null,ifModifiedSince=null,ifNoneMatch=null,ifUnmodifiedSince=null,key=<object key>,partNumber=null,range=null,requestPayer=null,responseCacheControl=null,responseContentDisposition=null,responseContentEncoding=null,responseContentLanguage=null,responseContentType=null,responseExpires=null,sseCustomerAlgorithm=null,sseCustomerKey=*** Sensitive Data Redacted ***,sseCustomerKeyMd5=null,versionId=null)

To force logging for sts:AssumeRoleWithWebIdentity, I added a manual call:

        StsClient {
            region = "regionFromConfig"
            interceptors += interceptorList
        }.use { sts ->
            val req = AssumeRoleWithWebIdentityRequest {
                roleArn = "<role ARN>"
                webIdentityToken = File("/var/run/secrets/eks.amazonaws.com/serviceaccount/token").readText()
            }
            val identity = sts.assumeRoleWithWebIdentity(req)
            logger.info("Loaded identity: {}", identity)
        }

I added interceptors for all readBefore* functions like this:

            override fun readBeforeTransmit(context: ProtocolRequestInterceptorContext<Any, HttpRequest>) {
                println("readBeforeTransmit")
                println(context.request)
                println(context.protocolRequest.url)
                println(context.protocolRequest.headers)
                runBlocking { println(context.protocolRequest.body.toByteStream()?.decodeToString()) }
                super.readBeforeTransmit(context)
            }

readBeforeSigning and readBeforeTransmit show the most detailed logs:

readBeforeTransmit
AssumeRoleWithWebIdentityRequest(durationSeconds=null,policy=null,policyArns=null,providerId=null,roleArn=<role ARN>,roleSessionName=null,webIdentityToken=*** Sensitive Data Redacted ***)
https://sts.eu-central-1.amazonaws.com/
Headers [Content-Type=[application/x-www-form-urlencoded], User-Agent=[aws-sdk-kotlin/1.0.35 ua/2.0 api/sts#1.0.35 os/linux#5.10.198-187.748.amzn2.x86_64 lang/kotlin#1.9.22 md/javaVersion#17.0.8 md/jvmName#OpenJDK_64-Bit_Server_VM md/jvmVersion#17.0.8+7 ], x-amz-user-agent=[aws-sdk-kotlin/1.0.35], amz-sdk-invocation-id=[13a8160c-e392-46fb-aac6-2fad0318bf90], amz-sdk-request=[attempt=3; max=3], Host=[sts.eu-central-1.amazonaws.com]]
Action=AssumeRoleWithWebIdentity&Version=2011-06-15&RoleArn=<url-encoded role ARN>&WebIdentityToken=<content of token file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token>

As for the Java SDK, I configured logging like this:

    <logger name="software.amazon.awssdk" level="trace" />
    <logger name="software.amazon.awssdk.request" level="trace" />
    <logger name="org.apache.http.wire" level="trace" />

And received these logs -- I hope I extracted the right part:

{"@timestamp":"2024-01-23T11:13:28.545781235Z","@version":"1","message":"(StsAssumeRoleWithWebIdentityCredentialsProvider()) Cached value is stale and will be refreshed.","logger_name":"software.amazon.awssdk.utils.cache.CachedSupplier","thread_name":"DefaultDispatcher-worker-1","level":"DEBUG","level_value":10000"}
{"@timestamp":"2024-01-23T11:13:28.546200211Z","@version":"1","message":"(StsAssumeRoleWithWebIdentityCredentialsProvider()) Refreshing cached value.","logger_name":"software.amazon.awssdk.utils.cache.CachedSupplier","thread_name":"DefaultDispatcher-worker-1","level":"DEBUG","level_value":10000"}
{"@timestamp":"2024-01-23T11:13:28.553257997Z","@version":"1","message":"Creating an interceptor chain that will apply interceptors in the following order: [software.amazon.awssdk.core.internal.interceptor.HttpChecksumValidationInterceptor@3b4717fc, software.amazon.awssdk.awscore.interceptor.HelpfulUnknownHostExceptionInterceptor@7d4411f, software.amazon.awssdk.awscore.eventstream.EventStreamInitialRequestInterceptor@f7e7260, software.amazon.awssdk.awscore.interceptor.TraceIdExecutionInterceptor@1bfcb241, software.amazon.awssdk.services.sts.endpoints.internal.StsResolveEndpointInterceptor@1807be35, software.amazon.awssdk.services.sts.endpoints.internal.StsEndpointAuthSchemeInterceptor@3dfed893, software.amazon.awssdk.services.sts.endpoints.internal.StsRequestSetEndpointInterceptor@416f403e, software.amazon.awssdk.protocols.query.interceptor.QueryParametersToBodyInterceptor@3fff9ce8, datadog.trace.instrumentation.aws.v2.TracingExecutionInterceptor@1a2c6b34]","logger_name":"software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain","thread_name":"DefaultDispatcher-worker-1","level":"DEBUG","level_value":10000"}
{"@timestamp":"2024-01-23T11:13:28.653186153Z","@version":"1","message":"Interceptor 'software.amazon.awssdk.services.sts.endpoints.internal.StsEndpointAuthSchemeInterceptor@3dfed893' modified the message with its modifyRequest method.","logger_name":"software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain","thread_name":"DefaultDispatcher-worker-1","level":"DEBUG","level_value":10000"}
{"@timestamp":"2024-01-23T11:13:28.653484764Z","@version":"1","message":"Old: AssumeRoleWithWebIdentityRequest(RoleArn=<role ARN>, RoleSessionName=aws-sdk-java-1706008407972, WebIdentityToken=*** Sensitive Data Redacted ***)\nNew: AssumeRoleWithWebIdentityRequest(RoleArn=<role ARN>, RoleSessionName=aws-sdk-java-1706008407972, WebIdentityToken=*** Sensitive Data Redacted ***)","logger_name":"software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain","thread_name":"DefaultDispatcher-worker-1","level":"TRACE","level_value":5000"}
{"@timestamp":"2024-01-23T11:13:28.755322612Z","@version":"1","message":"Interceptor 'software.amazon.awssdk.services.sts.endpoints.internal.StsRequestSetEndpointInterceptor@416f403e' modified the message with its modifyHttpRequest method.","logger_name":"software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain","thread_name":"DefaultDispatcher-worker-1","level":"DEBUG","level_value":10000"}
{"@timestamp":"2024-01-23T11:13:28.755740875Z","@version":"1","message":"Old: DefaultSdkHttpFullRequest(httpMethod=POST, protocol=https, host=sts.eu-central-1.amazonaws.com, encodedPath=, headers=[], queryParameters=[Action, Version, RoleArn, RoleSessionName, WebIdentityToken])\nNew: DefaultSdkHttpFullRequest(httpMethod=POST, protocol=https, host=sts.eu-central-1.amazonaws.com, encodedPath=, headers=[], queryParameters=[Action, Version, RoleArn, RoleSessionName, WebIdentityToken])","logger_name":"software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain","thread_name":"DefaultDispatcher-worker-1","level":"TRACE","level_value":5000"}
{"@timestamp":"2024-01-23T11:13:28.757949101Z","@version":"1","message":"Interceptor 'software.amazon.awssdk.protocols.query.interceptor.QueryParametersToBodyInterceptor@3fff9ce8' modified the message with its modifyHttpRequest method.","logger_name":"software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain","thread_name":"DefaultDispatcher-worker-1","level":"DEBUG","level_value":10000"}
{"@timestamp":"2024-01-23T11:13:28.758252021Z","@version":"1","message":"Old: DefaultSdkHttpFullRequest(httpMethod=POST, protocol=https, host=sts.eu-central-1.amazonaws.com, encodedPath=, headers=[], queryParameters=[Action, Version, RoleArn, RoleSessionName, WebIdentityToken])\nNew: DefaultSdkHttpFullRequest(httpMethod=POST, protocol=https, host=sts.eu-central-1.amazonaws.com, encodedPath=, headers=[Content-Length, Content-Type], queryParameters=[])","logger_name":"software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain","thread_name":"DefaultDispatcher-worker-1","level":"TRACE","level_value":5000"}
{"@timestamp":"2024-01-23T11:13:28.760609159Z","@version":"1","message":"Interceptor 'datadog.trace.instrumentation.aws.v2.TracingExecutionInterceptor@1a2c6b34' modified the message with its modifyHttpRequest method.","logger_name":"software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain","thread_name":"DefaultDispatcher-worker-1","level":"DEBUG","level_value":10000"}
{"@timestamp":"2024-01-23T11:13:28.760789670Z","@version":"1","message":"Old: DefaultSdkHttpFullRequest(httpMethod=POST, protocol=https, host=sts.eu-central-1.amazonaws.com, encodedPath=, headers=[Content-Length, Content-Type], queryParameters=[])\nNew: DefaultSdkHttpFullRequest(httpMethod=POST, protocol=https, host=sts.eu-central-1.amazonaws.com, encodedPath=, headers=[Content-Length, Content-Type, X-Amzn-Trace-Id], queryParameters=[])","logger_name":"software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain","thread_name":"DefaultDispatcher-worker-1","level":"TRACE","level_value":5000"}
{"@timestamp":"2024-01-23T11:13:28.956462915Z","@version":"1","message":"Sending Request: DefaultSdkHttpFullRequest(httpMethod=POST, protocol=https, host=sts.eu-central-1.amazonaws.com, encodedPath=, headers=[amz-sdk-invocation-id, Content-Length, Content-Type, User-Agent, X-Amzn-Trace-Id], queryParameters=[])","logger_name":"software.amazon.awssdk.request","thread_name":"DefaultDispatcher-worker-1","level":"DEBUG","level_value":10000"}
{"@timestamp":"2024-01-23T11:13:29.171758529Z","@version":"1","message":"Connecting to sts.eu-central-1.amazonaws.com/10.28.20.131:443","logger_name":"software.amazon.awssdk.http.apache.internal.conn.SdkTlsSocketFactory","thread_name":"DefaultDispatcher-worker-1","level":"TRACE","level_value":5000,"dd.trace_id":"8919891913232501401""}
{"@timestamp":"2024-01-23T11:13:29.349985724Z","@version":"1","message":"socket.getSupportedProtocols(): [TLSv1.3, TLSv1.2, TLSv1.1, TLSv1, SSLv3, SSLv2Hello], socket.getEnabledProtocols(): [TLSv1.3, TLSv1.2]","logger_name":"software.amazon.awssdk.http.apache.internal.conn.SdkTlsSocketFactory","thread_name":"DefaultDispatcher-worker-1","level":"DEBUG","level_value":10000,"dd.trace_id":"8919891913232501401""}
{"@timestamp":"2024-01-23T11:13:30.153450675Z","@version":"1","message":"created: sts.eu-central-1.amazonaws.com/10.28.20.131:443","logger_name":"software.amazon.awssdk.http.apache.internal.net.SdkSslSocket","thread_name":"DefaultDispatcher-worker-1","level":"DEBUG","level_value":10000,"dd.trace_id":"8919891913232501401""}
{"@timestamp":"2024-01-23T11:13:30.254453952Z","@version":"1","message":"Received successful response: 200, Request ID: 298d99b9-7cb2-4ce6-a1d7-a4176bd4d59e, Extended Request ID: not available","logger_name":"software.amazon.awssdk.requestId","thread_name":"DefaultDispatcher-worker-1","level":"DEBUG","level_value":10000"}

Any interceptors on S3Client would only log information about the S3 request

That was another bug, fixed here: smithy-lang/smithy-kotlin#1027

Thanks for providing the trace logs, you extracted the correct parts of each. However I can't see anything different between the two requests which could be causing the failure...

Next I am going to try replicating this by creating an EKS cluster similar to your setup.

Hi again, I've replicated your setup as closely as I could and still see no failures. I have an EKS environment using the same image as you. My test code authenticates using the StsWebIdentityCredentialsProvider and makes an S3 request. It succeeds with and without a proxy (mitmproxy running locally).

Replicating this issue is getting very tricky... are you able to share which proxy you're using?

Hey there, I 99% found the issue. It is indeed a difference in behavior between Java SDK and Kotlin SDK, but it's mostly on our end:

Calls to AWS endpoints (e.g. sts.eu-central-1.amazonaws.com) are not supposed to be proxied in our environment. Our team didn't add any AWS endpoint hostname to our NO_PROXY configuration, because the Java SDK wouldn't (somehow?) take those environment variables into account, so we were good. (The Java SDK log output above also doesn't show any reference to a proxy).

When migrating to Kotlin SDK, HTTPS_PROXY and NO_PROXY are now honored, and thus the connection problems. I'm sorry I missed the fact that AWS endpoint requests weren't proxied with the Java SDK in the first place, which would have spared the both of us a lot of debugging work.

Nice, good job nailing the root cause of the issue! I'll be closing this now since it's not a problem with the Kotlin SDK. Thanks for the report!

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.