Support Incremental Build of Container Images

Question

Support Incremental Build of Container Images

baronfel opened this issue a year ago · comments

Chet Husk commented a year ago

Right now there are roughly two steps to container creation:

determine metadata properties for the image/basic validation (ParseContainerProperties target)
image creation (PublishContainer target)

As a result, repeat publishes do repeat work

image manifests are re-downloaded
image manifests are re-negotiated to find the best single image
image configuration is re-downloaded
build assets are re-tar'd
layers are re-uploaded (though this can be quite quick because the registries do sanity-checking)
manifests/configs are re-uploaded

Each of these steps (and possible more, granular steps!) should be factored out into separate tasks/targets, and each step should establish clear inputs/outputs so that MSBuild incrementality can have its greatest effect. This will give two main benefits:

faster publishes for the single-image case
easier implementation of multi-manifest publishing due to more reusable components

In addition, more of the targets would be able to run natively in Visual Studio - only the layer-creation step would need to be re-implemented. Greater code-sharing in this way should lead to a more unified experience between VS and the CLI.

Rainer Sigwald · Answer 1 · Tue May 16 2023 05:57:29 GMT+0800 (China Standard Time)

Note that caching image manifests can be dangerous: you can imagine an incremental build that wakes up once a month to do an incremental containerize. Nothing on the local box has changed, but you'd want to fetch the latest image manifest definition and build a new image on top of it (with the same layer tarball that was used last time).

I might actually push for layer determinism before incrementality, since as you say the registry should handle the expensive part of layer deduplication.

Chet Husk · Answer 2 · Tue May 16 2023 22:31:31 GMT+0800 (China Standard Time)

Fair point there, there's some overlap with #114 as well in this discussion. Most classic container tooling caches manifests by default, but that's at odds with the 'secure/latest by default' intended use case. The intent I have with this issue is more to share work during a container publish, especially a multi-project/multi-RID publish. So I'd want to reduce the amount of manifest-fetching done across that entire set of operations.

Rainer Sigwald · Answer 3 · Tue May 16 2023 22:34:17 GMT+0800 (China Standard Time)

The intent I have with this issue is more to share work during a container publish, especially a multi-project/multi-RID publish. So I'd want to reduce the amount of manifest-fetching done across that entire set of operations.

An excellent design consideration! We might consider doing a RegisterTaskObject with lifetime Build to cache some of the fetches for the lifetime of a build/publish operation.

Joseph Petersen · Answer 4 · Wed Nov 01 2023 17:54:47 GMT+0800 (China Standard Time)

I wonder how this would work together with GHA caching which is something Docker is currently supporting as an experimental feature: https://docs.docker.com/build/cache/backends/gha/

Chet Husk · Answer 5 · Wed Nov 01 2023 20:10:00 GMT+0800 (China Standard Time)

We probably wouldn't interop with that feature, at least not initially. We already don't reuse the Docker cache (and neither do our contemporaries like Jib/ko).

Chet Husk · Answer 6 · Thu Mar 07 2024 23:16:03 GMT+0800 (China Standard Time)

Additional details: dotnet/sdk#39196 (comment)

If we were doing better MSBuild Incrementality, you could imagine a situation where we'd do

compute desired container base image (existing target)

fetch + resolve manifest (list) to single base image manifest from the base image (new step)

compute container config data (existing ComputeContainerConfig target)

download base layers (in parallel) (new step)

create the image/push to appropriate storage (existing target, much less work done inside it)

all as separate tasks that could take advantage of MSBuild incrementality and parallelism. In this world, none of the label-generation flags would ever need to be passed to the task. So I'm viewing this as an intermediate-stage.