basak / glacier-cli

Command-line interface to Amazon Glacier

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Warn about multipart

nomeata opened this issue · comments

Hi,

I just uploaded 32GB of data to Glacier with your tool (thanks for it), but was afterwards surprised by the 8500 requests this has caused, which cost extra money. This was caused by a multipart size of 8MB in glacier-ci, it seemed. For some reason I had expected glacier to upload my files in one go, unless I explicitly specify a multipart size.

You should mention this isssue more prominently, e.g. in README.md, and possibly change the default to uploading all in one go.

Thanks,
Joachim

Thank you for your report. I certainly don't want the tool to cause excessive charges, and agree with making sure the user is informed. There is a "Costs" section in README.md and I am happy to expand on this.

What was the actual cost difference between what you expected and what really happened? http://aws.amazon.com/glacier/pricing/ says "$0.050 per 1,000 requests". It looks like boto defaults to a 4 MiB default part size, so for a 32 GB upload, this would be around 8192 parts so 45 cents. Is this the difference that you are concerned about?

The API limit is 4 GiB per request. I think the multipart upload functionality got added to boto in the first place because uploads were failing for users uploading more than 4 GiB to an archive. Some have been using parallel uploads to speed them up, though glacier-cli doesn't do this (yet). Perhaps the 4 MiB default should be reconsidered, or there should be an option to change it?

Note that the default is coming from boto (a library for AWS), rather than glacier-cli itself. To get the default size changed you can file an issue or pull request in boto (github.com/boto/boto).

So, given all of that information, what needs to be done in glacier-cli? I'll happily accept a pull request to clarify costs in README.md, provided that won't become incorrect if something changes in Amazon's pricing or in boto.

On Sat, Oct 26, 2013 at 07:02:12PM -0700, basak wrote:

Thank you for your report. I certainly don't want the tool to cause
excessive charges, and agree with making sure the user is informed.
There is a "Costs" section in README.md and I am happy to expand on
this.

What was the actual cost difference between what you expected and what
really happened? http://aws.amazon.com/glacier/pricing/ says "$0.050
per 1,000 requests". It looks like boto defaults to a 4 MiB default
part size, so for a 32 GB upload, this would be around 8192 parts so
45 cents. Is this the difference that you are concerned about?

The API limit is 4 GiB per request. I think the multipart upload
functionality got added to boto in the first place because uploads
were failing for users uploading more than 4 GiB to an archive. Some
have been using parallel uploads to speed them up, though glacier-cli
doesn't do this (yet). Perhaps the 4 MiB default should be
reconsidered, or there should be an option to change it?

Note that the default is coming from boto (a library for AWS), rather
than glacier-cli itself. To get the default size changed you can file
an issue or pull request in boto (github.com/boto/boto).

So, given all of that information, what needs to be done in
glacier-cli? I'll happily accept a pull request to clarify costs in
README.md, provided that won't become incorrect if something changes
in Amazon's pricing or in boto.

FYI, I did a local modification of boto to change the default from 4MB
to 64MB. I had read the docs in the glacier-cli package and heavily
on the AWS site. and although the per request cost isn't huge I
figured 4MB chunks didn't make a lot of sense for my use case.

Oh and before I forget thank you for the tool.

I've wrapped it to handle chunking up (so for dirs of tiny files I don't
end up with 100,000's of then expensive AWS archives) and encrypting
before uploading. But having a cli I could call to do the uploads
made things quite nice.

It would be nice if boto allowed you to set the defaultpartsize
for all of the external calls and then have glacier-cli expose
that.

Steve

Hi,

Am Samstag, den 26.10.2013, 19:02 -0700 schrieb basak:

What was the actual cost difference between what you expected and what
really happened? http://aws.amazon.com/glacier/pricing/ says "$0.050
per 1,000 requests". It looks like boto defaults to a 4 MiB default
part size, so for a 32 GB upload, this would be around 8192 parts so
45 cents. Is this the difference that you are concerned about?

yes, indeed. Not that I mind the 50¢, but what if I had done that daily
for one month, and only then checked the balance?

The API limit is 4 GiB per request. I think the multipart upload
functionality got added to boto in the first place because uploads
were failing for users uploading more than 4 GiB to an archive. Some
have been using parallel uploads to speed them up, though glacier-cli
doesn't do this (yet). Perhaps the 4 MiB default should be
reconsidered, or there should be an option to change it?

Ah, I got confused: There is an option --multipart-size in
glacier-cli, but it is only for the archive retrieve subcommand...

So would it be possible to add a command line option in glacier-cli to
set the chunk size?

So, given all of that information, what needs to be done in
glacier-cli? I'll happily accept a pull request to clarify costs in
README.md, provided that won't become incorrect if something changes
in Amazon's pricing or in boto.

How about:

Costs
-----

Before you use Amazon Glacier, you should make yourself familiar with [how much it costs](http://aws.amazon.com/glacier/pricing/). Note that
  - archive retrieval costs are complicated and [may be a lot more than you expect](http://www.daemonology.net/blog/2012-09-04-thoughts-on-glacier-pricing.html).
  - files are uploaded in chunks, so uploading an archive causes many requests. The size of the parts is determined by the boto library, check `DefaultPartSize` in [the documentation](http://boto.readthedocs.org/en/latest/ref/glacier.html)

Greetings,
Joachim