Add snapshot image after initialization

Question

Add snapshot image after initialization

johanneskiesel opened this issue 5 years ago · comments

Currently, prepare commits the intermediate image after indexing, but not after initialization:
https://github.com/osirrc2019/jig/blob/4e3765cd59b0869c354b2d7c6f9da826624e470e/run.py#L47

Doing also a commit after initialization can save time, network traffic, and disk space (due to the layered file system, the downloaded files are then only stored once and not for every image).

The tag could be something like "{}-initialized".format(args.tag)

Jimmy Lin · Answer 1 · Thu Mar 28 2019 04:07:48 GMT+0800 (China Standard Time)

I'm 👎 on this but open to discussion.

Ryan Clancy · Answer 2 · Fri Mar 29 2019 08:27:39 GMT+0800 (China Standard Time)

If we did this, we would have two images. For example:

anserini-test:latest-initialized after init is called
anserini-test:latest-indexed after index is called

where first image would be the base image for the second.

I think this would lead into some odd lifecycle management where we'd need to update the base image of the second to be the updated (after re-init) first image, if that's even possible. Another approach may be to start a container using the second image and re-run the init script, but this again can get complicated (init scripts should then be idempotent and need to clean-up existing files before downloading new ones).

I'm 👎 on this too for now as it would add a lot of hidden complexity.

Johannes Kiesel · Answer 3 · Fri Mar 29 2019 14:36:17 GMT+0800 (China Standard Time)

Maybe then there is confusion here: Why would you want to re-init an image? I thought init is just about setup? So my confusion is: why would I want to run setup every time I index an collection, when I can just start with a snapshot of after setup was completed?

But in case you would need to re-init an image (I can imagine if you encountered an error or so), why can't you just create both latest-initialized and latest-indexed again? I see you would need an additional "--purge" parameter (or so) for allowing people to forcing an init even if there is already an initialized image.

Jimmy Lin · Answer 4 · Fri Mar 29 2019 19:05:56 GMT+0800 (China Standard Time)

I think the tradeoff is more complex lifecycle management... I think we're assuming that init/index will be done once and that's it.

I suppose with all the bells and whistles we can bind each subcommand to a hook and allow committing at each phase in a flexible manner? I'm inclined to punt on this for now though...

Johannes Kiesel · Answer 5 · Fri Mar 29 2019 21:26:47 GMT+0800 (China Standard Time)

I see, and I want to say that it is not my intention to press this issue (which might have been lost from the original mail to this issue). I'm well aware that this can be added later on without a problem (it requires no change to the specification), so you can just wait to see whether index is done just once or more often.

Jimmy Lin · Answer 6 · Mon Apr 01 2019 00:07:41 GMT+0800 (China Standard Time)

No worries! Thanks for your contributions!