enhancement: systemd-nspawn to launch real Image= container
rektide opened this issue · comments
using systemd-nspawn would let us run real containers, under systemd. from the man page's description:
systemd-nspawn may be used to run a command or OS in a light-weight namespace container. In many ways it is similar to chroot(1), but more powerful since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and the host and domain name.).
it would be neat to add a mode to systemk that allows us to use systemd-nspawn to run full containers. Image= could be real images. systemd-nspawn supports a variety of network modes, such as macvlan and ipvlan. there is an machinectl shell
which would enable runInContainer
, a systemk feature request tracked in #37. pretty specific/fancy stuff but there's also things like the ability to clone a btrfs subvolume & launch that.
to run a systemd-nspawn container is a couple step process:
- create a filesystem in
/var/lib/machines/
with the expanded image. systemd's machinectl tool includesimport-tar
andimport-fs
helpers which could help load images. to fetch an image, we could usedocker save
,docker export
, or something like what nspawn-oci does to load the image ( using skopeo to get & oci-image-tools to expand an image). - run the container. this can be done ephemerally, or we can create a unit.
a. use systemd-nspawn to run the container once off, or
b. create a/etc/systemd/nspawn/my-service.nspawn
unit file to configure a container then runmachinectl start
to start it, in a managed way.
note that two years ago systemd seemed interested in becoming a runc compatible runtime, and if that happens my understanding is we could just run containerd directly against it, which might be a better idea than adapting systemk.
background: current systemk architecture
some notes i took, investigating how systemk works now
kubernetes pods are backed by systemd .service units created by unit manager.
these systemd services use the local system to run. Image= refers to debian packages that are run on the system.
TL;DR is that's interesting but then why not just run a kubelet + CRI-compatible container runtime, eg containerd?
Philosophical question aside, I do think the feature requested above is pretty doable. However, we would raise a problem where systemk
provides different UX for containerized workloads vs non-containerized, eg RunInContainer
which is not possible (yet?) in the latter form of a workload.
Can you still run root-less in the above scenario?