awslabs / aws-crt-java

Java bindings for the AWS Common Runtime

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Certificates from default truststore not working with TLS

Jeansen opened this issue · comments

Describe the bug

At work, we use the S3TransferManager. The excerpt below show the
AwsCrtAsyncHttpClient. Unfortunately, with this, TLS does not work.

SdkAsyncHttpClient httpClient = AwsCrtAsyncHttpClient.builder()
         .maxConcurrency(64)
         .build();

S3Configuration serviceConfiguration = S3Configuration.builder()
         .checksumValidationEnabled(false)
         .chunkedEncodingEnabled(true)
         .build();


var localS3AsyncClientBuilder = S3AsyncClient.builder().httpClient(httpClient)
         .region(Region.of("dus"))
         .serviceConfiguration(serviceConfiguration);

Later on, credentials are added

.credentialsProvider(StaticCredentialsProvider.create(AwsBasicCredentials.create(accessKeyId, secretAccessKey)))
.build()

and it is then passed to the transfer manager

S3TransferManager.builder()
         .s3Client(s3AsyncClient)
         .build(),

In the extended logs, I see something like this:
Client connection failed with error 1029 (AWS_IO_TLS_ERROR_NEGOTIATION_FAILURE).

And checking with Wireshark shows me:
Description: Certificate Unknown (46)

The same is true, If I use the CRT Client like this:

         S3AsyncClient.crtBuilder()
             .region(Region.of("dus"))
             .checksumValidationEnabled(false)
             .maxConcurrency(multipartMaxConcurrency.orElse(null))
             .minimumPartSizeInBytes(multipartMinPartSize.orElse(null))
             .targetThroughputInGbps(multipartTargetThroughput.orElse(null));

BUT, when I use the NettyNioAsyncHttpClient, it works. No TLS Handshake problem,
and even TLS 1.3 is used.

Now, I am stuck. I could not find any (documented) way to tell the CRT Client how
to find the relevant certificates needed for the TLS handshake to succeed.
Currently, I've simply added them to the default JDK truststore.

Any help is greatly appreciated.

If this is the wrong place for this issue or I simply missed something in the docs, then please bear with me.

Expected Behavior

AwsCrtAsyncHttpClient should behave the same as NettyNioAsyncHttpClient with respect to TLS hanshake and negotiation.

Current Behavior

AwsCrtAsyncHttpClient is not usable. TLS hanshake fails whereas with NettyNioAsyncHttpClient TLS hanshake and negotiation works like a charm.

Reproduction Steps

Unfortunately, I have not working, simple code fragment. All I can say is already provided in the detailed bug report.

Possible Solution

No response

Additional Information/Context

We use the AWS SDK to connect to our local Cloudian S3 storage. In addition we use Quarkus (not native, yet).

aws-crt-java version used

0.24.1

Java version used

OpenJdk 11 and GraalVM 22.3

Operating System and version

MacOS 12.3

Hello @Jeansen ,

Thank you very much for your submission. I am assuming here that you are using the AWS JAVA SDK v2? (If that is the case, could you please provide the SDK version used?)

If possible could you please provide the full debug log?
You can enable CRT Debug logging in two ways:

Either add in the code:

// logging to a file

Log.initLoggingToFile(Log.LogLevel.Debug, "log.txt")

or

// logging to the console
Log.initLoggingToStdout(Log.LogLevel.Debug);

before the SDK client initialization.

Or use the following system property:
-Daws.crt.debugnative=true -Daws.crt.log.level=Trace -Daws.crt.log.destination=File

This will allow me to have more context on the behavior you are experiencing.

Best regards,

Yasmine

Look for a line like: "...Based on OS, we detected the default PKI path as..." or "...Default TLS trust store not found on this system..."

If you see the "...detected the default PKI path..." line, is the path correct? If not, what should the path be for your OS, and what OS is it?

@graebm Theres no such line in the logs. The OS I provided in the initial Bug description.

@yasminetalby Thanks for your pick response.

Here is the dependency list with respect to all the AWS dependencies:

[INFO]    software.amazon.awssdk:s3:jar:2.20.119:compile -- module 
software.amazon.awssdk.services.s3 [auto]
[INFO]    software.amazon.awssdk:aws-xml-protocol:jar:2.20.119:compile -- module 
software.amazon.awssdk.protocols.xml [auto]
[INFO]    software.amazon.awssdk:aws-query-protocol:jar:2.20.119:compile -- 
module software.amazon.awssdk.protocols.query [auto]
[INFO]    software.amazon.awssdk:profiles:jar:2.20.119:compile -- module 
software.amazon.awssdk.profiles [auto]
[INFO]    software.amazon.awssdk:endpoints-spi:jar:2.20.119:compile -- module 
software.amazon.awssdk.endpoints [auto]
[INFO]    software.amazon.awssdk:url-connection-client:jar:2.20.119:compile -- 
module software.amazon.awssdk.http.urlconnection [auto]
[INFO]    software.amazon.awssdk:utils:jar:2.20.119:compile -- module 
software.amazon.awssdk.utils [auto]
[INFO]    software.amazon.awssdk:annotations:jar:2.20.119:compile -- module 
software.amazon.awssdk.annotations [auto]
[INFO]    software.amazon.awssdk:http-client-spi:jar:2.20.119:compile -- module 
software.amazon.awssdk.http [auto]
[INFO]    software.amazon.awssdk:s3-transfer-manager:jar:2.20.119:compile -- 
module software.amazon.awssdk.transfer.s3 [auto]
[INFO]    software.amazon.awssdk:sdk-core:jar:2.20.119:compile -- module 
software.amazon.awssdk.core [auto]
[INFO]    software.amazon.awssdk:regions:jar:2.20.119:compile -- module 
software.amazon.awssdk.regions [auto]
[INFO]    software.amazon.awssdk:arns:jar:2.20.119:compile -- module 
software.amazon.awssdk.arns [auto]
[INFO]    software.amazon.awssdk:aws-core:jar:2.20.119:compile -- module 
software.amazon.awssdk.awscore [auto]
[INFO]    software.amazon.awssdk:json-utils:jar:2.20.119:compile -- module 
software.amazon.awssdk.protocols.jsoncore [auto]
[INFO]    software.amazon.awssdk:third-party-jackson-core:jar:2.20.119:compile 
-- module software.amazon.awssdk.thirdparty.jackson.core [auto]
[INFO]    software.amazon.awssdk:protocol-core:jar:2.20.119:compile -- module 
software.amazon.awssdk.protocols.core [auto]
[INFO]    software.amazon.awssdk:auth:jar:2.20.119:compile -- module 
software.amazon.awssdk.auth [auto]
[INFO]    software.amazon.awssdk.crt:aws-crt:jar:0.24.1:compile -- module 
aws.crt (auto)
[INFO]    software.amazon.awssdk:aws-crt-client:jar:2.20.119:compile -- module 
software.amazon.awssdk.http.crt [auto]
[INFO]    software.amazon.awssdk:metrics-spi:jar:2.20.119:compile -- module 
software.amazon.awssdk.metrics [auto]
[INFO]    software.amazon.awssdk:crt-core:jar:2.20.119:compile -- module 
software.amazon.awssdk.crtcore [auto]

And here's the log:
log1.log

I had to clean it up a bit and replaced our cloudian domain with a pseudo-domain. Otherwise, the log file is untouched.

Hello @Jeansen ,

Thank you very much for providing all this information.

Here, we can see:

[WARN] [2023-08-30T07:41:39Z] [0000700012693000] [tls-handler] - id=0x7fde7708f830: negotiation failed with OSStatus -9807.1

This OSStatus reports an errSSLXCertChainInvalid : Invalid certificate chain.

It seems like their might be an issue with your certificate or your configuration profile on your device.

Best,

Yasmine

@yasminetalby
Yes, that's what I found out so far, too. Problem ist, it works like a charm if I replace AwsCrtAsyncHttpClient with NettyNioAsyncHttpClient. So I wonder, why it works with one client, but not with the other? They should be interchangeable and take alle the settings.

I utilized the openSSL library tools and checked our certificate chain. It's all fine and valid.

And like I wrote in the initial description, on the network layer it looks like there isn't even any certificate provided. The problem on the OS layer ist just a bit higher up in the call stack and in my interpretation simply misleading.

But maybe I am missing something. Anyway, my expectation is that either client should take any explicit or default Java trust store into account which currently does not seem to be the case.

AwsCrtAsyncHttpClient uses the Apple's native Security Framework for doing TLS.

NettyNioAsyncHttpClient is likely using the JDK's SSLContext (based on these docs)

And when you use OpenSSL tools ... that's using OpenSSL.

Sadly, these are 3 totally different security libraries. It's possible Apple's Security Framework is rejecting a certificate that OpenSSL and the JDK's SSLContext will accept.

What happens when you use curl to hit the endpoint? I know that curl will also use Apple's Security Framework. Does it also have a certificate error?

Also, are you hitting S3 itself or a non-AWS S3-compatible service?

@graebm When I use curl, it also works fine. The same is true with aws cli s3api ...
Anyway, it should be documented. When I use a Java library/sdk I'd expect it to work the same, anywhere: Write once, run anywhere ;-)

BTW: I work on MacOS, my co-worker in this project uses Windows. He has the same problem. And the actual service running in a Kubernetes Pod tells me the same. That's a Linux environment then.

@bretambrose We use an allegedly S3-compatible Service, Cloudian.

How exactly did you add the certificate to your Mac's default trust store? Did you pass any extra arguments to curl to make it work, like --cacert?

I want to try and reproduce the issue locally somehow, so I can play around with this more easily. Maybe I could run a local server with a self-signed cert, and add that cert to my Mac's default truststore to repro it? Or if your domain is reachable publicly, maybe you'd be comfortable emailing me (my github username at amazon com) the domain, and certs necessary to establish a TLS connection.

It doesn't seem like mTLS is involved, so just the (non-secret) root cert should be sufficient maybe

@graebm Yes, I tried with and without the --cacert option. And in any case, it worked fine. The -v option tells me a bit more and also shows me the TLS handshake (with v 1.3) worked flawlessly.

Regarding a test, that is unfortunately a bit of a problem. Our Cloudian system is not public and only accessible on premise. Regarding the Certificates: It's a chain of Root-CA->CA->Server

On MacOS I dragged the relevant certificates in the keychain tool (system tree), so they are available globally.

@bretambrose No, mTLS is not involved.

BTW: I work on MacOS, my co-worker in this project uses Windows. He has the same problem. And the actual service running in a Kubernetes Pod tells me the same. That's a Linux environment then.

I'll have to correct this one: It did not work because the service did not find any valid certificates, yet. As stated earlier, the trust store is not respected by the CRT wrapper. So, I put all needed certificates in the container. Now, the TLS issue is gone.

So, either the wrapper has some issues on MacOS or some configuration on my side is missing. I doubt the latter one since all other commands and tools work just fine.

Anyway, I'd like to stress that one should update the relevant parts in the SDK documentation or it should work like any other pure Java library using a dedicated or default trust store.

Anyway, I'd like to stress that one should update the relevant parts in the SDK documentation or it should work like any other pure Java library using a dedicated or default trust store.

Agree, it should use the default trust store, and we assumed that's what was happening. We need a local repro case to figure out why that's not happening, is why I asked how you'd set up your machine. We'll try to get time to look into this...

I worry that there may be some confusion/misunderstanding between system-trust-store and JVM-trust-store.

@graebm If I can be of any further assistance, please don't hesitate to reach out.

Regarding the Linux setup: Our Pods use RHEL and I simply copied everything over to /etc/pki/ca-trust/source/anchors/. After that I ran update-ca-trust and everything worked.

@bretambrose What confusion? It's a JVM SDK, right? On the other hand, if there is a difference, it should be configurable. Ultimately, it should not make any difference for a Java/JVM application. Even if it is only a wrapper for some lower level native bindings. Otherwise this would break - to some degree - the uniform access principle. And as @graebm agreed, it should be documented, at least.

I think the confusion is that "trust store" to us means system trust store, but to you means the JVM trust store. That's pure speculation -- I can't tell given what's been said so far -- but it would certainly explain the behavior you've observed so far.

On newer RHELs crt is checking /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem for system level trust store. Can you verify if that pem file has the certs you expect?

On newer RHELs crt is checking /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem for system level trust store. Can you verify if that pem file has the certs you expect?

That path is filled when you call update-ca-trust. Like I said, no problem here.

I think the confusion is that "trust store" to us means system trust store, but to you means the JVM trust store. That's pure speculation -- I can't tell given what's been said so far -- but it would certainly explain the behavior you've observed so far.

I have to disagree. In the realm of Java (and we are talking about a Java library here), trust store and key store (aka TrustStore, KeyStore) are well defined ;-) Maybe have a look here: https://www.baeldung.com/java-keystore-truststore-difference

I think what Bret was referring to is that default truststore has different meanings for C and Java developers and that is causing some confusion in the discussion above.

To summarize the issue, CRT is a C library and follows typical C behavior of treating system truststore as the default trust store. And while aws-crt-java is a Java specific set of bindings for a C library, it currently lacks any logic to switch default from system truststore to java truststore and exposes C behavior to Java.
So there is a gap between what Java devs would expect and how the library works. We'll discuss and prioritize internally on how we can address this gap.

I put up a PR documenting how the CRT doesn't use the Java TrustStore.

I also tried to reproduce the issue where the custom certificate was ignored on MacOS, despite you having added it to the Keychain.

But it did work for me. Here's what I did on my Intel Mac:

  • I ran a dopey python server, using a "localhost" self-signed certificate and key.
  • curl https://localhost:4443 failed
  • I dragged the certificate file into my keychain, then clicked Info->Trust->When using this certificate: Always Trust
  • Now curl https://localhost:4443 succeeded
  • Repeated the same steps using Elasticurl.java (a curl clone in our test folder that uses our HTTP and TLS classes).
  • I got the same results as using curl. It worked when the localhost cert was trusted in the KeyChain. But when I removed the cert from the keychain, the connections would fail with the same TLS errors you are seeing

@graebm OK, I'll try this on Monday. Maybe I am simply missing any trust settings. Anyway curious that other commands work. But I am not that a pro with all these certificates, anyway.

I think with the updated documentation in place, we can close this one. With the right settings on Linux, it works as expected (and now documented). I believe for my case with MacOS it's probably a PEBKAC then ;-).

Finally, many kudos to all involved parties who helped resolving this issue. Keep up the good work!

Agree, that it's unexpected when a Java library doesn't use the Java TrustStore. We discussed ways to make the library more Java-like, by reading the Java TrustStore and combining its certs with the certs from the OS, but it's a big task and we'd need more evidence that it would really help people before prioritizing something like that. There's a good chance it could would introduce unexpected issues too, so just documenting the behavior seems like a good step for now.