dotnet / sdk-container-builds

Libraries and build tooling to create container images from .NET projects using MSBuild

Home Page:https://learn.microsoft.com/en-us/dotnet/core/docker/publish-as-container

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support Incremental Build of Container Images

baronfel opened this issue · comments

Right now there are roughly two steps to container creation:

  • determine metadata properties for the image/basic validation (ParseContainerProperties target)
  • image creation (PublishContainer target)

As a result, repeat publishes do repeat work

  • image manifests are re-downloaded
  • image manifests are re-negotiated to find the best single image
  • image configuration is re-downloaded
  • build assets are re-tar'd
  • layers are re-uploaded (though this can be quite quick because the registries do sanity-checking)
  • manifests/configs are re-uploaded

Each of these steps (and possible more, granular steps!) should be factored out into separate tasks/targets, and each step should establish clear inputs/outputs so that MSBuild incrementality can have its greatest effect. This will give two main benefits:

  • faster publishes for the single-image case
  • easier implementation of multi-manifest publishing due to more reusable components

In addition, more of the targets would be able to run natively in Visual Studio - only the layer-creation step would need to be re-implemented. Greater code-sharing in this way should lead to a more unified experience between VS and the CLI.

Note that caching image manifests can be dangerous: you can imagine an incremental build that wakes up once a month to do an incremental containerize. Nothing on the local box has changed, but you'd want to fetch the latest image manifest definition and build a new image on top of it (with the same layer tarball that was used last time).

I might actually push for layer determinism before incrementality, since as you say the registry should handle the expensive part of layer deduplication.

Fair point there, there's some overlap with #114 as well in this discussion. Most classic container tooling caches manifests by default, but that's at odds with the 'secure/latest by default' intended use case. The intent I have with this issue is more to share work during a container publish, especially a multi-project/multi-RID publish. So I'd want to reduce the amount of manifest-fetching done across that entire set of operations.

The intent I have with this issue is more to share work during a container publish, especially a multi-project/multi-RID publish. So I'd want to reduce the amount of manifest-fetching done across that entire set of operations.

An excellent design consideration! We might consider doing a RegisterTaskObject with lifetime Build to cache some of the fetches for the lifetime of a build/publish operation.

I wonder how this would work together with GHA caching which is something Docker is currently supporting as an experimental feature: https://docs.docker.com/build/cache/backends/gha/

We probably wouldn't interop with that feature, at least not initially. We already don't reuse the Docker cache (and neither do our contemporaries like Jib/ko).

Additional details: dotnet/sdk#39196 (comment)

If we were doing better MSBuild Incrementality, you could imagine a situation where we'd do

  • compute desired container base image (existing target)
  • fetch + resolve manifest (list) to single base image manifest from the base image (new step)
  • compute container config data (existing ComputeContainerConfig target)
  • download base layers (in parallel) (new step)
  • create the image/push to appropriate storage (existing target, much less work done inside it)

all as separate tasks that could take advantage of MSBuild incrementality and parallelism. In this world, none of the label-generation flags would ever need to be passed to the task. So I'm viewing this as an intermediate-stage.