angeloc / s3-pit-restore

The new home for the s3-pit-restore tool!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Objects larger than 5GB cannot be restored & the error message is misleading

Makeshift opened this issue · comments

When attempting to PIT restore files today I came across this error:

"2021-04-30 13:45:31+00:00" l035JNl1kPeWvKi1Zftbxs0ulNx59SqH 0 STANDARD VREAnalysisResult/69e521de-8942-4296-ae4f-a6252c240563/average_power_quantiles_by_generator_1H.nc ERROR: An error occurred (InvalidRequest) when calling the CopyObject operation: The specified copy source is larger than the maximum allowable size for a copy source: 5368709120

However, the file in question isn't above 5GB (The limit for copy_object) . It's only 6MB. It appears that the file that failed to restore is actually VREAnalysisResult/65424f29-4f5e-4a9b-9a3a-107d08e6d99e/average_power_quantiles_by_generator_10T.nc (which is 5.7GB) which makes the error a little misleading.

This error is caused by the use of Boto3's copy_object instead of copy, apparently.

Could you try on doing the change, see if it solves and propose a fix?

Thanks!

Having a quick look at your code, it looks like you do actually use copy rather than copy_object, so I'm not really sure what's happening here. https://github.com/angeloc/s3-pit-restore/blob/master/s3-pit-restore#L271

Stating to boto3 documentation here, the copy method already handles the multipart upload.
What I think is that the default threshold for triggering the multipart upload mechanism is higher than the aws limit and hence the error. We should explicitly set the limit as describe here

Ah apologies, it seems the version I'm using does indeed use copy_object. It looks like the pypi repo version is outdated at 835119d (Nov 2018).

Would you be able to do a release to update the pypi package?

Right, it was on the checklist. I'll do as soon as possible

Tested with the latest version on git and it works like a charm, so closing this issue. Thanks for your help!