Migration from 2.3.2 to 2.4.x requires AmazonS3Client's UploadPartRequest parameter's inputStream implement a working reset()

Question

Migration from 2.3.2 to 2.4.x requires AmazonS3Client's UploadPartRequest parameter's inputStream implement a working reset()

philip-fox opened this issue 5 years ago · comments

We (IBM GHHS Analytics team, Dublin, Ireland) are using 2.3.2's AmazonS3Client#upload with an UploadPartRequest parameter whose inputStream is an extended java.io.InputStream in which we override its read(); its mark() and reset() remains unmolested. Therefore, its reset() is:

    public synchronized void reset() throws IOException {
        throw new IOException("mark/reset not supported");
    }

This was fine in 2.3.2, but it seems that in 2.4.0 onwards, you introduced a change to AmazonS3Client.java that calls reset() on the UploadPartRequest's inputStream. However, your API suggests that UploadPartRequest accepts any InputStream, e.g. java.io.InputStream.

This is causing us to see this error:

[err] java.io.IOException: mark/reset not supported
[err] 	at java.io.InputStream.reset(InputStream.java:370)
[err] 	at [internal classes]
[err] 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
[err] 	at com.ibm.websphere.jaxrs.server.IBMRestServlet.service(IBMRestServlet.java:96)
[err] 	at [internal classes]
[err] com.ibm.cloud.objectstorage.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received (Service: Amazon S3; Status Code: 400; Error Code: BadDigest; Request ID: ec93cdfc-d028-42ba-bdf6-ce9ee95339c4), S3 Extended Request ID: null
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588)
[err] 	at [internal classes]
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
[err] 	at [internal classes]
[err] 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
[err] 	at com.ibm.websphere.jaxrs.server.IBMRestServlet.service(IBMRestServlet.java:96)
[err] 	at [internal classes]
[err] com.ibm.cloud.objectstorage.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received (Service: Amazon S3; Status Code: 400; Error Code: BadDigest; Request ID: ec93cdfc-d028-42ba-bdf6-ce9ee95339c4), S3 Extended Request ID: null
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588)
[err] 	at [internal classes]
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
[err] 	at [internal classes]
[err] 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
[err] 	at com.ibm.websphere.jaxrs.server.IBMRestServlet.service(IBMRestServlet.java:96)
[err] 	at [internal classes]

Thus, your code is not backwards-compatible. This is currently blocking us because our Security Team requires us to migrate to the latest version of your JAR.

philip-fox · Answer 1 · Tue Apr 02 2019 22:32:21 GMT+0800 (China Standard Time)

This is the section of AmazonS3Client.java code that's causing the problem:

 if(uploadPartRequest.getMd5Digest() == null && uploadPartRequest.isCalculateMD5()) {
    try {
        request.addHeader("Content-MD5", Md5Utils.md5AsBase64(isCurr));
        isCurr.reset();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }     
}

It's my understanding that if you want to call reset() on an InputStream, you are pre-required to ensure that markSupported() returns true, which it does not in the case of java.io.InputStream.

From the API documentation of java.io.InputStream#reset:

The general contract of reset is:

If the method markSupported returns true, then:

If the method mark has not been called since the stream was created, or the number of bytes read from the stream since mark was last called is larger than the argument to mark at that last call, then an IOException might be thrown.

If such an IOException is not thrown, then the stream is reset to a state such that all the bytes read since the most recent call to mark (or since the start of the file, if mark has not been called) will be resupplied to subsequent callers of the read method, followed by any bytes that otherwise would have been the next input data as of the time of the call to reset.

If the method markSupported returns false, then:

The call to reset may throw an IOException.

If an IOException is not thrown, then the stream is reset to a fixed state that depends on the particular type of the input stream and how it was created. The bytes that will be supplied to subsequent callers of the read method depend on the particular type of the input stream.

The method reset for class InputStream does nothing except throw an IOException.

barry-hueston · Answer 2 · Tue Apr 02 2019 22:54:40 GMT+0800 (China Standard Time)

Hi @philip-fox we are investigating this issue and will give you an update ASAP.

philip-fox · Answer 3 · Tue Apr 02 2019 23:23:27 GMT+0800 (China Standard Time)

@barry-hueston-IBM Thanks Barry. As a workaround, we're calling UploadPartRequest.setCalculateMD5(false) to bypass that if statement shown in #17 (comment), and that seems to work.

runnerpaul · Answer 4 · Wed Apr 03 2019 15:44:29 GMT+0800 (China Standard Time)

@philip-fox a fix for a different issue(#16) was added to version 2.4.4. It should also fix your issue.

Please check and confirm.

philip-fox · Answer 5 · Wed Apr 03 2019 17:58:09 GMT+0800 (China Standard Time)

@runnerpaul I'm afraid I was using v2.4.4 and still experienced the problem.

In a nutshell, I think the problem is that this code is being invoked on an input-stream that doesn't support mark/reset:

 if(uploadPartRequest.getMd5Digest() == null && uploadPartRequest.isCalculateMD5()) {
    try {
        request.addHeader("Content-MD5", Md5Utils.md5AsBase64(isCurr));
        isCurr.reset();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }     
}

Therefore, the line isCurr.reset(); throws the IOException.

As a workaround, I was calling UploadPartRequest.setCalculateMD5(false) in my own code, so that that if-statement shown above would not be called. But the problem with that is that now MD5 is not being calculated/checked for uploaded parts, which mightn't be a good idea in relation to security.

Maybe you guys need to rethink your code; maybe you need to check if isCurr.markSupported() returns true before calling isCurr.reset().

Craig Muchinsky · Answer 6 · Wed Apr 03 2019 20:46:40 GMT+0800 (China Standard Time)

To preserve the ability to read ahead and get the hash regardless if what type of InputStream is used, perhaps UploadPartRequest.setInputStream could check markSupported() and if necessary wrap the given InputStream with a BufferedInputStream that buffers enough to get the hash.

philip-fox · Answer 7 · Thu Apr 04 2019 18:57:48 GMT+0800 (China Standard Time)

@barry-hueston-IBM @runnerpaul @cmuchinsky
This has now become a blocker for us, because the hack-change I made to our code described in #17 (comment) isn't acceptable to our team (i.e. uploading file-parts without the MD5 check being performed).

Can this ticket be raised to blocker level please and be treated with urgency?

barry-hueston · Answer 8 · Thu Apr 04 2019 19:40:23 GMT+0800 (China Standard Time)

Hi @philip-fox , we are looking at this now, as a priority.

smcgrath · Answer 9 · Thu Apr 04 2019 22:00:39 GMT+0800 (China Standard Time)

Hi @philip-fox Im in the process of recreating this issue. Is the exception preventing uploads for you? From reading through the block of code the exception is caught on the reset() & the stacktrace printed.

philip-fox · Answer 10 · Thu Apr 04 2019 22:17:42 GMT+0800 (China Standard Time)

@smcgrath-IBM Yes it's preventing uploads when we use the later JARs (v2.4.x). Our security team here has mandated that we use these later JARs. In our code, when we it starts the upload of a part, our logs show the stacktrace mentioned in #17 (comment).

smcgrath · Answer 11 · Thu Apr 04 2019 22:22:12 GMT+0800 (China Standard Time)

yes, Id expect to see the stacktrace in the logs, which is not ideal & can be tidied with a call on isCurr.markSupported() as suggested. Can you send on logs following the stacktrace?

philip-fox · Answer 12 · Thu Apr 04 2019 22:33:59 GMT+0800 (China Standard Time)

@smcgrath-IBM

Just curious, in v2.3.2, were you doing the MD5 check on the parts?
It seems to me that the if-statement in your AmazonS3Client.java I mentioned above, and the call to reset(), are used for calculating the MD5.
So I'm wondering, if we continue to use an InputStream that doesn't support mark/reset, then the MD5 won't be calculated by your AmazonS3Client, right? Like, what I mean is, this marking/resetting the stream is solely to do with computing the MD5, isn't it?

This is part of our logs:

[err] java.io.IOException: mark/reset not supported
[err] 	at java.io.InputStream.reset(InputStream.java:370)
[err] 	at [internal classes]
[err] 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
[err] 	at com.ibm.websphere.jaxrs.server.IBMRestServlet.service(IBMRestServlet.java:96)
[err] 	at [internal classes]
[err] com.ibm.cloud.objectstorage.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received (Service: Amazon S3; Status Code: 400; Error Code: BadDigest; Request ID: ec93cdfc-d028-42ba-bdf6-ce9ee95339c4), S3 Extended Request ID: null
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588)
[err] 	at [internal classes]
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
[err] 	at [internal classes]
[err] 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
[err] 	at com.ibm.websphere.jaxrs.server.IBMRestServlet.service(IBMRestServlet.java:96)
[err] 	at [internal classes]
[err] com.ibm.cloud.objectstorage.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received (Service: Amazon S3; Status Code: 400; Error Code: BadDigest; Request ID: ec93cdfc-d028-42ba-bdf6-ce9ee95339c4), S3 Extended Request ID: null
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588)
[err] 	at [internal classes]
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
[err] 	at com.ibm.cloud.objectstorage.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
[err] 	at [internal classes]
[err] 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
[err] 	at com.ibm.websphere.jaxrs.server.IBMRestServlet.service(IBMRestServlet.java:96)
[err] 	at [internal classes]

Request error encountered. <REDACTED>: The Content-MD5 you specified did not match what we received (Service: Amazon S3; Status Code: 400; Error Code: BadDigest; Request ID: <REDACTED>-d028-42ba-bdf6-ce9ee95339c4)
	at <REDACTED>
	at <REDACTED>
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
	at java.lang.reflect.Method.invoke(Method.java:508)
	at com.ibm.ws.jaxrs20.server.LibertyJaxRsServerFactoryBean.performInvocation(LibertyJaxRsServerFactoryBean.java:652)
	at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.performInvocation(LibertyJaxRsInvoker.java:160)
	at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)
	at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:273)
	at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:191)
	at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:444)
	at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:101)
	at org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:61)
	at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:99)
	at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:309)
	at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:124)
	at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:271)
	at com.ibm.ws.jaxrs20.endpoint.AbstractJaxRsWebEndpoint.invoke(AbstractJaxRsWebEndpoint.java:134)
	at com.ibm.websphere.jaxrs.server.IBMRestServlet.handleRequest(IBMRestServlet.java:146)
	at com.ibm.websphere.jaxrs.server.IBMRestServlet.doPost(IBMRestServlet.java:104)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
	at com.ibm.websphere.jaxrs.server.IBMRestServlet.service(IBMRestServlet.java:96)
	at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1255)
	at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:743)
	at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:440)
	at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1221)
	at com.ibm.ws.webcontainer.webapp.WebApp.handleRequest(WebApp.java:4968)
	at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.handleRequest(DynamicVirtualHost.java:314)
	at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:992)
	at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.run(DynamicVirtualHost.java:279)
	at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink$TaskWrapper.run(HttpDispatcherLink.java:1047)
	at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.wrapHandlerAndExecute(HttpDispatcherLink.java:417)
	at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.ready(HttpDispatcherLink.java:376)
	at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:532)
	at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleNewRequest(HttpInboundLink.java:466)
	at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.processRequest(HttpInboundLink.java:331)
	at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.ready(HttpInboundLink.java:302)
	at com.ibm.ws.channel.ssl.internal.SSLConnectionLink.determineNextChannel(SSLConnectionLink.java:1059)
	at com.ibm.ws.channel.ssl.internal.SSLConnectionLink$MyReadCompletedCallback.complete(SSLConnectionLink.java:644)
	at com.ibm.ws.channel.ssl.internal.SSLReadServiceContext$SSLReadCompletedCallback.complete(SSLReadServiceContext.java:1803)
	at com.ibm.ws.tcpchannel.internal.WorkQueueManager.requestComplete(WorkQueueManager.java:501)
	at com.ibm.ws.tcpchannel.internal.WorkQueueManager.attemptIO(WorkQueueManager.java:571)
	at com.ibm.ws.tcpchannel.internal.WorkQueueManager.workerRun(WorkQueueManager.java:926)
	at com.ibm.ws.tcpchannel.internal.WorkQueueManager$Worker.run(WorkQueueManager.java:1015)
	at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:239)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.lang.Thread.run(Thread.java:812)

smcgrath · Answer 13 · Thu Apr 04 2019 23:31:28 GMT+0800 (China Standard Time)

@philip-fox Im chasing down the history of the change myself, I'll get back to you on that.

For the log above, did you run that with UploadPartRequest.setCalculateMD5(false)? If so, can you run without setting to false? It looks like the request was made, but failed on the api side.
I would expect the request to run even with the IOException stacktrace. Can you run as described & send on logs including & after the stacktrace?

philip-fox · Answer 14 · Thu Apr 04 2019 23:53:49 GMT+0800 (China Standard Time)

@smcgrath-IBM That log is from a run without calling UploadPartRequest.setCalculateMD5(x) at all, whether x is true or false.

Therefore, the com.ibm.cloud.objectstorage.services.s3.model.UploadPartRequest's instance variable, com.ibm.cloud.objectstorage.services.s3.model.UploadPartRequest#isCalculateMD5 remains set to true because this is how it's declared in com.ibm.cloud.objectstorage.services.s3.model.UploadPartRequest:

    /**
     * Allows the caller to indicate that SDK should calculate the Content-MD5
     * for part. Defaults to true. 
     */
    private boolean isCalculateMD5 = true;

philip-fox · Answer 15 · Fri Apr 05 2019 00:38:33 GMT+0800 (China Standard Time)

@smcgrath-IBM
I'm looking at v2.3.2 AmazonS3Client.java and v2.4.4 AmazonS3Client.java.

v2.3.2:

        isCurr = new InputSubstream(
                isCurr,
                uploadPartRequest.getFileOffset(),
                partSize,
                uploadPartRequest.isLastPart());
        MD5DigestCalculatingInputStream md5DigestStream = null;
        if (uploadPartRequest.getMd5Digest() == null
                && !skipMd5CheckStrategy.skipClientSideValidationPerRequest(uploadPartRequest)) {
            /*
             * If the user hasn't set the content MD5, then we don't want to buffer the whole
             * stream in memory just to calculate it. Instead, we can calculate it on the fly
             * and validate it with the returned ETag from the object upload.
             */
            isCurr = md5DigestStream = new MD5DigestCalculatingInputStream(isCurr);
        }
        final ProgressListener listener = uploadPartRequest.getGeneralProgressListener();

v2.4.4:

            isCurr = new InputSubstream(
                    isCurr,
                    uploadPartRequest.getFileOffset(),
                    partSize,
                    closeStream);
            
            // Calculate Content MD5 on part upload if requested.  
            if(uploadPartRequest.getMd5Digest() == null
            		&& uploadPartRequest.isCalculateMD5()) {
	            try {
					request.addHeader("Content-MD5", Md5Utils.md5AsBase64(isCurr));
					isCurr.reset();
				} catch (IOException e) {
					// TODO Auto-generated catch block
				  	e.printStackTrace();
				}     
            }          
            
            MD5DigestCalculatingInputStream md5DigestStream = null;
            if (uploadPartRequest.getMd5Digest() == null
                    && !skipMd5CheckStrategy.skipClientSideValidationPerRequest(uploadPartRequest)) {
                /*
                 * If the user hasn't set the content MD5, then we don't want to buffer the whole
                 * stream in memory just to calculate it. Instead, we can calculate it on the fly
                 * and validate it with the returned ETag from the object upload.
                 */
                isCurr = md5DigestStream = new MD5DigestCalculatingInputStream(isCurr);
            }
            final ProgressListener listener = uploadPartRequest.getGeneralProgressListener();

From the code above, it seems that in v2.3.2 there's a check to see if the UploadPartRequest does NOT have its MD5 pre-calculated and set, and if that's the case then your code calculates it on the fly somehow as per the comment:

If the user hasn't set the content MD5, then we don't want to buffer the whole stream in memory just to calculate it. Instead, we can calculate it on the fly and validate it with the returned ETag from the object upload.

So would that suggest that in v2.3.2, MD5 validation on file-parts always occurred, even if the caller didn't set the the MD5 value in the UploadPartRequest via UploadPartRequest#setMd5Digest(md5Digest)?

If that's the case, then for v2.4.4, if we set the UploadPartRequest in such a way that the if-statement below is never invoked, does this mean that MD5 validation will be done regardless of whether the if-statement code runs? (because after the if-statement, the code of both versions is pretty much the same)

            // Calculate Content MD5 on part upload if requested.  
            if(uploadPartRequest.getMd5Digest() == null
            		&& uploadPartRequest.isCalculateMD5()) {
	            try {
					request.addHeader("Content-MD5", Md5Utils.md5AsBase64(isCurr));
					isCurr.reset();
				} catch (IOException e) {
					// TODO Auto-generated catch block
				  	e.printStackTrace();
				}     
            }

I suspect now that it might. That's the main question I have: does bypassing that code in the if-statement cause MD5 validation not to occur

barry-hueston · Answer 16 · Fri Apr 05 2019 19:38:13 GMT+0800 (China Standard Time)

Tracking internally with the ticket CSAFE-53246

smcgrath · Answer 17 · Fri Apr 05 2019 20:48:20 GMT+0800 (China Standard Time)

@philip-fox MD5 is calculated on the fly on the client side based the return eTag value. The addition within 2.4.4 sends a Content-MD5 header for server side validation also. We can workaround the stacktrace, but I would like to know why the MD5 check is failing. Can you run a number of requests in debug mode & send through a complete log file?

philip-fox · Answer 18 · Thu Apr 18 2019 22:53:23 GMT+0800 (China Standard Time)

For clarity, I supplied @smcgrath-IBM with a test application with which he can reproduce the problem.

smcgrath · Answer 19 · Thu Apr 18 2019 22:57:33 GMT+0800 (China Standard Time)

@philip-fox I haven't received the test app. Can you zip it up & drop it on the github page?

smcgrath · Answer 20 · Wed May 01 2019 19:10:19 GMT+0800 (China Standard Time)

@philip-fox release 2.4.5 went live overnight, this should address your issue. Let me know when you have tried. thx

philip-fox · Answer 21 · Wed May 01 2019 21:00:20 GMT+0800 (China Standard Time)

@smcgrath-IBM I've just tried it: it works. Great, thanks for that. I'll close this ticket. I'll ping you on Slack.