discussion: builder future: buildkit

Question

discussion: builder future: buildkit

tonistiigi opened this issue 7 years ago · comments

I'm creating a new issue here about the buildkit proposal discussed in #32550 (comment). Although the implementation for this wouldn't be part of this repo I don't know a better place to discuss this atm. It's part of the Moby effort to eventually break up the monolith and unblock innovation on build use cases.

The proposal

(originally from https://gist.github.com/tonistiigi/059fc72c4630f066d94dafb5e0e70dc6)

Buildkit is a proposal to separate out docker build experience into a separate project, allowing different users to collaborate on the underlying technology and reuse and customize it in different ways.

One of the main design goals of buildkit is to separate frontend and backend concerns during a build process. A frontend is something designed for the users to describe their build definition. Backend solves the problem of finding a most efficient way to solve a common low-level description of the build operations, that has been prepared for them by the frontends.

The purpose of buildkit is not to be an arbitrary task runner. Instead, buildkit solves the problem of converting source code to an artifact in a self-contained, portable, reproducible, and most efficient way. Invoking builder should be traceable to immutable sources and invoking it shouldn't have any side-effects. Buildkit will support intelligent caching of artifacts of previous invocations so it can be efficiently used in a developer workflow.

Buildkit is meant to be used as a long-running service. It is optimized for parallel execution of complex projects and building multiple projects at the same time.

Design draft:

Buildkit is separated to following subcomponents:

sources - getting data from remote sources
frontends - preparing build definition from users to low-level format
solver - finds the most efficient way to execute build instructions graph and leverage caching for next invocations
worker - component in charge of actually running the modification step on a source
exporter - component for getting the final results back from the builder
snapshots - implementation for filesystems manipulations while builder executes
cache manager - component managing the recently used artifacts for efficient rebuilds
controlAPI - definition for linking multiple builders together for nested invocation

Connection with docker platform

Buildkit is meant to become the next generation backend implementation for docker build command and github.com/docker/docker/builder package. This doesn't mean any changes to Dockerfile format as buildkit draws a boundary between build backends and frontends. Dockerfile would be one of the frontend implementations.

When invoked from the Docker CLI buildkit would be capable of exposing clients context directory as a source and use Docker containers as a worker. The snapshots would be backed by Docker's layer store(containerD snapshot drivers). End results from the builder would be exported to docker images.

Frontends

A frontend is a component that takes in user-provided build definition, parses it and prepares a generalized definition for the low-level builder.

Buildkit supports multiple frontends. Most common example of a frontend is Dockerfile. Frontend also has access to the other components of the builder. It can access the sources directly and has access to store/get resources from cache. For example, for Dockerfile to correctly parse the FROM command, it needs to request the config for an image.

Solver/Low-level builder

The core part of the builder is a solver that takes a DAG of low-level build instructions from the frontend and finds a way to execute them in a most efficient manner while keeping the cache for the next invocations.

For this, the graph of build instructions should be loaded into a content addressable store. Evey item in that store can have dependencies from previous items. That makes sure that no definitions are duplicated. To start a builder a root node from that graph is asked to be solved with a provided worker options. That internally will call the same action for its dependencies and so on.

While solving an instruction a cache key is computed to see if a result for the instruction can be already found without computing the step. If it is found, a snapshot associated with the cache-key can be used as a result directly. After every instruction, the result of the operation is stored by the same cache key for future use.

The goal is to:

Minimize duplication between builder invocations that share common steps.
Minimize duplication between build steps that return identical results.
Find efficient parallelization of steps

Supported operations for LLB:

LLB is optimized for simplicity. The main operation that it support is running a process in the context of one snapshot and capturing the modifications this process made. To simplify and optimize implementation there is a built-in operation for copying data from one snapshot to another and accessing data from one of the remote sources known to the builder.

type Input struct {
  Base Op
  Index int
}

type Op struct {
  Deps []Input
  Outs []snapshot.Snapshot
}

type ExecOp struct {
  Op
  Meta ExecMeta
  Mounts []Mount // mapping for inputs to paths
}

type CopyOp struct {
  Op
  Sources []string
  Dest string
}

type SourceOp struct {
  Op
  Identifier string
}

Low-level builder only works on snapshots. There are no methods for controlling image metadata changes. Image metadata can be managed by a frontend, should it be needed. The only component that could know about the image format is image exporter. The ExecMeta structure is the defined by buildkit and contains a minimal set of properties describing a running process. If more properties are needed (host networking etc) they must be set when initializing the worker and DAG solver has no idea of their existence.

ExecOp can depend on multiple snapshots as its inputs, one of them would be mounted at / and be used as the root filesystem. Every operation could also export multiple output snapshots.

Another operation to support invoking other builders in a build operation to support nested invocation of builders. This is covered more in the ControlAPI section.

Sources

Sources is a component that allows registering transport methods to the builder that prepare remote data to the snapshot. A build operation can refer to a remote data by an identifier, that identifier is used to find the registered source provider that has the actual implementation.

Supported built-in sources include docker images, git repositories, http archives and local directories. It is likely that a source implementation uses cache from previous invocations to speed up getting access to the data. Docker image source would skip pulling image layers that it has pulled before, git source could reuse previously pulled repo and only pull in incremental changes.

When integrated with docker build extra source would be available that allows access to the files sent by Docker client.

Worker

Worker is a component tasked with running a build step. Only required steps are moving data between snapshots and executing a command with correct data mounts.

type ExecMeta struct {
  Args []string
  Env  []string
  User string
  Wd   string
  Tty  bool
  // DisableNetworking bool
}

Usually worker would run a container to execute the process but that is not a requirement set by the builder.

Exporter

An exporter is a post-step that runs before data is returned. Unlike docker build where every build results in an image being added to Docker's image store, buildkit can export the build results in any other formats.

That means that the build result may be a plugin or OCI-image bundle or maybe just a file or a directory.

Snapshots

Snapshots cover the implementation for the filesystem modifications needed during the build. It supports plugging in different backends. When buildkit is used as part of docker build, it would use Docker's layer store or ContainerD snapshot drivers as a backend. But alternative implementations could be provided, for example, a build would probably work quite good with a FUSE based snapshot backend.

Snapshots API uses reference counting because the same data may be used outside the build as well. When a part of the system(build operation) takes a reference to a snapshot it can't be deleted by anything else.

Cache

The persistent storage used by buildkit is managed automatically by a cache manager component. The user can specify a simple cache policy that is used by the garbage collector to clean up unneeded resources.

type GCPolicy struct {
    MaxSize         uint64
    MaxKeepDuration time.Duration
}

The build cache contains snapshots previously accessed by the builder and some metadata for the operations(cache keys) referring to these snapshots. If a builder has stopped using a snapshot, before releasing it, it would call RetainSnapshot(snapshot.Snapshot, CachePolicy) to make the cache manager responsible for keeping the snapshot data and releasing it once free space is needed.

The user also has control to see what is currently tracked by yhr cache manager and manually prune its contents.

type CacheManager interface {
    DiskUsage(context.Context) ([]CacheRecord, error)
    Prune(context.Context, CacheSelector) error
    GC(context.Context, GCPolicy) error
}

Cache import/export

Build cache can be exported out of buildkit and imported in another machine. It can be stored in the registry using a distribution manifest.

ExportCache(selector CacheSelector) ([]byte, []snapshot.Snapshot, error)

ExportCache would export a config object with metadata how the snapshots are referred by operation cache-keys. That data could be pushed to a registry with every snapshot being pushed as a separate blob. Cache importer can read back this config and expose the operation cache to the currently running builder action. If a cache-key requested by an operation is not found locally but exist in the imported configuration, snapshot associated with it can be pulled in from the registry.

ControlAPI

Control API is an API layer for controlling the builder while it is running as a long-running service. It supports invoking a build job and inspecting how a build job would execute. The user should be able to query a build target by not executing it and get to see the vertex graph of all operations that would be executed and if they are already backed by the cache.

Load(context.Context, []Op) error
Build(context.Context, digest.Digest, bool) ([]Vertex, error)

By defining a common interface a client program can be used with multiple builder implementations. This also enables supporting nested builder invocations as a build operation. That would be similar to ExecOp but instead of executing a process builder would invoke a controlapi.Build request instead.

Action items:

High level design discussion, q&a
Create a separate repo?
Gather a group of people who want to participate in design/development of this

@dnephin @AkihiroSuda @vdemeester @duglin @cpuguy83 @justincormack @simonferquel @alexellis @dmcgowan @mlaventure

Alex Ellis · Answer 1 · Sat Apr 29 2017 05:24:05 GMT+0800 (China Standard Time)

This is something that interests me, thanks for the ping. 👍 I also wondered if you had thought about conducting some kind of engagement with community projects that are doing work with the builder like Dockramp etc?

Btw. I think this will change? github.com/docker/docker/builder -> github.com/moby/moby/builder

Tõnis Tiigi · Answer 2 · Sat Apr 29 2017 07:40:13 GMT+0800 (China Standard Time)

@alexellis This encourages new community projects that want to either provide a special interface for declaring a build definition(with frontends) or define custom low-level build functionality(with nested invocation). A project like Dockramp that reimplements docker build features could just be a specific configuration of buildkit with Dockerfile frontend that uses Docker Remote API to implement the worker/snapshotter.

I think this will change?

Yeah, things are in flux atm. Buildkit would be an open Moby project, docker build would just be one opinionated configuration of it.

Akihiro Suda · Answer 3 · Sat Apr 29 2017 15:38:28 GMT+0800 (China Standard Time)

👍

It might be good to consider Build Context as well.
It should support sending multiple archives with multiple mediaTypes, rather than a single tar.

Op would have a dependency on arbitrary number of contexts, and can be executed when these deps are ready.
So some ops can be executed before sending the whole contexts

Akihiro Suda · Answer 4 · Sat Apr 29 2017 15:41:19 GMT+0800 (China Standard Time)

Uh, it seems already considered as Sources 😄

Simon Ferquel · Answer 5 · Tue May 02 2017 17:08:08 GMT+0800 (China Standard Time)

Could we also explicitly define process/service boundaries ? for ex, should a frontend be implemented in a client CLI ? as an independent swarm service independently scalable / upgradable / pluggable ? Do we intend to use Swarm at all for that ? Now that Docker is clearly multi-platform, should we go with a multi-worker architecture so that a single Buildkit deployment can build images for Linux, Windows, ARM, whatever comes next from a single endpoint ?

Anyway, I am very interested in it :)

Tõnis Tiigi · Answer 6 · Wed May 03 2017 04:07:56 GMT+0800 (China Standard Time)

@simonferquel

Could we also explicitly define process/service boundaries ? for ex, should a frontend be implemented in a client CLI ? as an independent swarm service independently scalable / upgradable / pluggable ?

buildkit is a library so it is meant to be wrapped by any cli. It will probably have a "test-cli" that supports swapping frontends but that is specific to that binary. You are probably talking about docker build case. I think it would be much more powerful if we would allow docker build to plug into different frontends on runtime instead while they can just run as separate services internally. The current design implies that a frontend also has access to sources/snapshots. For example, only Dockerfile frontend knows that the default build definition is in a file called Dockerfile so it will read that file from the snapshot. I'd like to give more control to extension points so they could really innovate and not just be limited to copying what Dockerfile does. There is a (loose) definition for control API for invoking build jobs.

Do we intend to use Swarm at all for that ?

Buildkit should support multiple instances of workers(design tbd). For docker build to use this with swarm it is currently missing a component that would let you move snapshots between nodes. If that is done it should be possible to have distributed docker build. So the intention is there but that isn't the most critical problem atm.

Now that Docker is clearly multi-platform, should we go with a multi-worker architecture so that a single Buildkit deployment can build images for Linux, Windows, ARM, whatever comes next from a single endpoint ?

This should be possible with multiple workers. How this is managed by user should be controlled by frontend.

Akihiro Suda · Answer 7 · Wed May 24 2017 01:34:20 GMT+0800 (China Standard Time)

nit: I'm not sure what is the planned github org/repo yet, but I just noticed that https://github.com/buildkit has been already used by other people 😅

Sebastiaan van Stijn · Answer 8 · Fri Jun 09 2017 07:06:19 GMT+0800 (China Standard Time)

Looks like that was abandoned, so perhaps still possible

Tõnis Tiigi · Answer 9 · Sat Jun 24 2017 03:48:44 GMT+0800 (China Standard Time)

Development repo in: https://github.com/moby/buildkit

Akihiro Suda · Answer 10 · Fri Jul 07 2017 17:40:49 GMT+0800 (China Standard Time)

Distributed mode proposal. RFC.

moby/buildkit#62

Akihiro Suda · Answer 11 · Mon Nov 13 2017 17:58:27 GMT+0800 (China Standard Time)

opened moby/buildkit#160 for design, roadmap, and bof notes

Daniel Peebles · Answer 12 · Sat Nov 18 2017 04:31:57 GMT+0800 (China Standard Time)

Has anyone compared this design to the design of Nix? It seems very similar in goals and I'm curious if that was considered before starting this, and if it was, what the major differences are.

Akihiro Suda · Answer 13 · Sat Nov 18 2017 06:18:53 GMT+0800 (China Standard Time)

Nix doesn't seem to have cache-aware distributed mode
Nix doesn't use containers by default
Nix is written in C++ while BuildKit is in Go
Nix is released under LGPL while BuildKit is under Apache License 2
And more

Bazel is also similar but

Bazel supports Hazelcast-based cache while BuildKit is trying to remove such dependency
Bazel doesn't use containers by default
Bazel is written in C++ while BuildKit is in Go
Bazel is governed by Google while BuildKit is by the Moby Project

Daniel Peebles · Answer 14 · Thu Nov 30 2017 23:43:48 GMT+0800 (China Standard Time)

thanks!

Nix doesn't seem to have cache-aware distributed mode

It has built-in distributed builds and that's what powers https://hydra.nixos.org and other related build environments, unless I'm misunderstanding what you mean.

Nix doesn't use containers by default

This has been controversial but fully isolated builds are one flag away (and much of the community wants that flag to default to true, even if it doesn't do so today), using Linux namespaces, seccomp, and no capabilities. This means zero network connectivity inside a build, and the only parts of the filesystem visible to you are the ones you need for your dependencies. On macOS there's a different mechanism with similar properties.

Nix is written in C++ while BuildKit is in Go
Nix is released under LGPL while BuildKit is under Apache License 2

Can't argue there 😄

Anyway, I mostly just like to see acknowledgments of related work when new work is started/proposed. It's very different to know the landscape and deliberately choose to try something different, vs. accidentally reinventing the wheel. It sounds like you know what's going on but it would be lovely to get a more fleshed out version of these points in a permanent location, because I'm pretty sure I won't be the only person with these questions when they see your project.

Thanks!

Jess Frazelle · Answer 15 · Mon Feb 26 2018 00:23:42 GMT+0800 (China Standard Time)

Hey so I was wondering since this is being worked on to be moved upstream I think I can help with testing. I am sure @crosbymichael and others remember what happened the last time someone rewrote the builder and we had to test it and fix a ton of bugs. Anyways I successfully ran buildkit on all my images but I am sure mine all kinda have the same semantic structure so if you want help testing this across many I would be happy to help.

Also @tonistiigi thanks for buildkit it is truly awesome :)

Akihiro Suda · Answer 16 · Wed Feb 28 2018 12:54:36 GMT+0800 (China Standard Time)

@jessfraz I think we want to integrate containerd contentstore/snapshotter/imagestore to Moby first.

Akihiro Suda · Answer 17 · Fri Nov 02 2018 16:03:44 GMT+0800 (China Standard Time)

BuildKit has been already integrated to Docker since v18.06

@tonistiigi Can we close this issue?

Alex Ellis · Answer 18 · Tue Feb 04 2020 04:00:50 GMT+0800 (China Standard Time)

Remind me of the context?

Sebastiaan van Stijn · Answer 19 · Tue Feb 04 2020 04:04:25 GMT+0800 (China Standard Time)

@alexellis I think it's a spam account