containers / youki

A container runtime written in Rust

Home Page:https://containers.github.io/youki/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`youki` fails to run in `docker-in-docker` with `cgroups v1`

jprendes opened this issue · comments

Reproduction:

cargo install cross --git https://github.com/cross-rs/cross
git clone --branch dind git@github.com:jprendes/youki.git
cd youki
cross build --features systemd,v1,v2 --bin youki
./dind.sh

Removing --runtime=youki from dind.sh, the example runs fine, but with youki it fails.

Old error message The error I receive is
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/278419ddb31fde6701841c5b302a632751de2a5a6aa051a62530b12ee8ab4763/log.json: no such file or directory): /youki did not terminate successfully: exit status 1: unknown.
ERRO[0003] error waiting for container:

This originates from tying to run runwasi on docker-in-docker. In that case it works fine on cgroups v2 but fails on cgroups v1, and the error is while mounting cgroups.

New error message

After workind around the issue with journald, I get an error when using youki with dind + cgroups v1.

libcontainer::rootfs::mount: failed to canonicalize "/sys/fs/cgroup/systemd/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6": No such file or directory (os error 2)

See #2528 (comment) for log context.

From the dockerd log files

cleanup warnings
    time="2023-11-14T09:19:48Z"
    level=warning
    msg="failed to remove runc container"
    error="/youki did not terminate successfully: exit status 1: failed to initialize observability: No such file or directory (os error 2)\nError: No such file or directory (os error 2)\n"
    runtime=io.containerd.runc.v2
    time="2023-11-14T09:19:48Z"
    level=warning msg="failed to read init pid file"
    error="open /run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/0ad434dd1d96105d31e416174c5c4d3b6a721c32586b612218dcad3cb68bb9f6/init.pid: no such file or directory"
    runtime=io.containerd.runc.v2 
    namespace=moby

I believe fixing this particular error would be the first step, and not the final solution.
As I mentioned before, runwasi fails on cgroups mounting, so eventially we should hit the same issue.

I believe fixing this particular error would be the first step, and not the final solution.

I think this is a side effect of failure to create container, just showing up as different error. The failed to read init pid file is mostly because the mounting failed, hence the init process creation didn't happen, hence there is no pid file created. I think this would get automatically fixed by addressing the original issue, not a separate problem.

The above error is due to journald logging. Working around that I get to the cgroups error.

2023-11-14T10:35:57.050971Z  INFO libcgroups::common: cgroup manager V1 will be used
2023-11-14T10:35:57.051044Z DEBUG libcgroups::v1::manager: Get path for subsystem: cpu
2023-11-14T10:35:57.051816Z DEBUG libcgroups::v1::manager: Get path for subsystem: cpuacct
2023-11-14T10:35:57.052422Z DEBUG libcgroups::v1::manager: Get path for subsystem: cpuset
2023-11-14T10:35:57.053005Z DEBUG libcgroups::v1::manager: Get path for subsystem: devices
2023-11-14T10:35:57.053567Z DEBUG libcgroups::v1::manager: Get path for subsystem: hugetlb
2023-11-14T10:35:57.054149Z DEBUG libcgroups::v1::manager: Get path for subsystem: memory
2023-11-14T10:35:57.054748Z DEBUG libcgroups::v1::manager: Get path for subsystem: pids
2023-11-14T10:35:57.055309Z DEBUG libcgroups::v1::manager: Get path for subsystem: perf_event
2023-11-14T10:35:57.055930Z DEBUG libcgroups::v1::manager: Get path for subsystem: blkio
2023-11-14T10:35:57.056538Z DEBUG libcgroups::v1::manager: Get path for subsystem: net_prio
2023-11-14T10:35:57.057108Z DEBUG libcgroups::v1::manager: Get path for subsystem: net_cls
2023-11-14T10:35:57.057789Z DEBUG libcgroups::v1::manager: Get path for subsystem: freezer
2023-11-14T10:35:57.081832Z DEBUG libcgroups::v1::blkio: Apply blkio cgroup config
2023-11-14T10:35:57.081903Z DEBUG libcgroups::v1::devices: Apply Devices cgroup config
2023-11-14T10:35:57.082138Z DEBUG libcgroups::v1::memory: Apply Memory cgroup config
2023-11-14T10:35:57.082191Z DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Pid, path: None }
2023-11-14T10:35:57.082381Z DEBUG libcontainer::process::channel: sending init pid (Pid(167))
2023-11-14T10:35:57.082664Z DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Uts, path: None }
2023-11-14T10:35:57.082753Z DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Ipc, path: None }
2023-11-14T10:35:57.082809Z DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Network, path: None }
2023-11-14T10:35:57.083165Z DEBUG libcontainer::namespaces: unshare or setns: LinuxNamespace { typ: Mount, path: None }
2023-11-14T10:35:57.083306Z DEBUG libcontainer::rootfs::rootfs: prepare rootfs rootfs="/var/lib/docker/rootfs/overlayfs/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0"
2023-11-14T10:35:57.084111Z DEBUG libcontainer::rootfs::rootfs: mount root fs "/var/lib/docker/rootfs/overlayfs/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0"
2023-11-14T10:35:57.084162Z DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/proc", typ: Some("proc"), source: Some("proc"), options: Some(["nosuid", "noexec", "nodev"]) }
2023-11-14T10:35:57.084312Z DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/dev", typ: Some("tmpfs"), source: Some("tmpfs"), options: Some(["nosuid", "strictatime", "mode=755", "size=65536k"]) }
2023-11-14T10:35:57.084439Z DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/dev/pts", typ: Some("devpts"), source: Some("devpts"), options: Some(["nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5"]) }
2023-11-14T10:35:57.084526Z DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/sys", typ: Some("sysfs"), source: Some("sysfs"), options: Some(["nosuid", "noexec", "nodev", "ro"]) }
2023-11-14T10:35:57.084616Z DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/sys/fs/cgroup", typ: Some("cgroup"), source: Some("cgroup"), options: Some(["ro", "nosuid", "noexec", "nodev"]) }
2023-11-14T10:35:57.084675Z DEBUG libcontainer::rootfs::mount: mounting cgroup v1 filesystem
2023-11-14T10:35:57.084711Z DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/sys/fs/cgroup", typ: Some("tmpfs"), source: Some("tmpfs"), options: Some(["noexec", "nosuid", "nodev", "mode=755"]) }
2023-11-14T10:35:57.085640Z DEBUG libcontainer::rootfs::mount: cgroup mounts: ["/sys/fs/cgroup/systemd", "/sys/fs/cgroup/cpu,cpuacct", "/sys/fs/cgroup/devices", "/sys/fs/cgroup/rdma", "/sys/fs/cgroup/hugetlb", "/sys/fs/cgroup/pids", "/sys/fs/cgroup/net_cls,net_prio", "/sys/fs/cgroup/cpuset", "/sys/fs/cgroup/misc", "/sys/fs/cgroup/blkio", "/sys/fs/cgroup/freezer", "/sys/fs/cgroup/memory", "/sys/fs/cgroup/perf_event"]
2023-11-14T10:35:57.085798Z DEBUG libcontainer::rootfs::mount: Process cgroups: {"memory": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6/docker/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0", "misc": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6", "": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6", "perf_event": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6/docker/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0", "freezer": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6/docker/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0", "cpuset": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6/docker/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0", "net_cls,net_prio": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6/docker/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0", "hugetlb": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6/docker/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0", "devices": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6/docker/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0", "rdma": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6", "name=systemd": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6", "pids": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6/docker/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0", "blkio": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6/docker/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0", "cpu,cpuacct": "/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6/docker/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0"}
2023-11-14T10:35:57.085938Z DEBUG libcontainer::rootfs::mount: cgroup root: "/var/lib/docker/rootfs/overlayfs/c08259f3ad0c2c0c99c8016da601c212fd304c8b4aa8391d2b70486a2b73d7e0/sys/fs/cgroup"
2023-11-14T10:35:57.085964Z DEBUG libcontainer::rootfs::mount: Mounting (emulated) "systemd" cgroup subsystem
2023-11-14T10:35:57.085994Z DEBUG libcontainer::rootfs::mount: Mounting emulated cgroup subsystem: Mount { destination: "/sys/fs/cgroup/systemd", typ: Some("bind"), source: Some("/sys/fs/cgroup/systemd/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6"), options: Some(["rw", "rbind"]) }
2023-11-14T10:35:57.086030Z DEBUG libcontainer::rootfs::mount: mounting Mount { destination: "/sys/fs/cgroup/systemd", typ: Some("bind"), source: Some("/sys/fs/cgroup/systemd/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6"), options: Some(["rw", "rbind"]) }
2023-11-14T10:35:57.086112Z ERROR libcontainer::rootfs::mount: failed to canonicalize "/sys/fs/cgroup/systemd/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6": No such file or directory (os error 2)
2023-11-14T10:35:57.086150Z ERROR libcontainer::rootfs::mount: failed to mount Mount { destination: "/sys/fs/cgroup/systemd", typ: Some("bind"), source: Some("/sys/fs/cgroup/systemd/docker/af0c557cb9806654e1a9eac1de4a12afda57bf9f60952e4c3db016668c554ea6"), options: Some(["rw", "rbind"]) }: io error
2023-11-14T10:35:57.086189Z ERROR libcontainer::rootfs::mount: failed to mount systemd cgroup hierarchy: io error
2023-11-14T10:35:57.086227Z ERROR libcontainer::rootfs::mount: failed to mount cgroup v2: io error
2023-11-14T10:35:57.086253Z ERROR libcontainer::process::container_init_process: failed to prepare rootfs err=Mount(Io(Os { code: 2, kind: NotFound, message: "No such file or directory" }))
2023-11-14T10:35:57.086291Z ERROR libcontainer::process::container_intermediate_process: failed to initialize container process: failed to prepare rootfs
2023-11-14T10:35:57.086653Z ERROR libcontainer::process::container_main_process: failed to wait for init ready: failed to receive. "waiting for init ready". BrokenChannel
2023-11-14T10:35:57.086701Z ERROR libcontainer::container::builder_impl: failed to run container process err=Channel(ReceiveError { msg: "waiting for init ready", source: BrokenChannel })

I think this is a side effect of failure to create container, just showing up as different error. The failed to read init pid file is mostly because the mounting failed, hence the init process creation didn't happen, hence there is no pid file created. I think this would get automatically fixed by addressing the original issue, not a separate problem.

I think the interesting part of that error message was the failed to initialize observability part.
Working around that issue, I get to the same issue as in runwasi.

What happens if we use the cgroup v2 driver instead of the systemd driver?

I might be missunderstanding you, but the host has cgroups-v1, using a v2 driver would not be correct, right?

it looks like when I run it, it's taking the path of setup_emulated_subsystem instead of setup_namespaced_subsystem.
That branch is controlled by cgroups_ns.
If I ignore cgroups_ns and unconditionally use setup_namespaced_subsystem everything works fine.
I'm not familiar with the need for emulation here. Why do we need it and how does it relate to cgroups_ns?

I think I have a fix for it: main...jprendes:youki:fix-dind

It still needs some kind of test. Unfortunately the docker image is built on alpine so it needs a musl build of youki.

I think I have a fix for it: main...jprendes:youki:fix-dind

It still needs some kind of test. Unfortunately the docker image is built on alpine so it needs a musl build of youki.

Thanks for this fix:

main...jprendes:youki:fix-dind#diff-7593ca57a3e963ffce568c39b57a4632daf6378db43c6f66b0f3eac191abaf94R83-R91

I also experience this issue when developing youki in a dev container.

I'll create a PR after adding some test

I might be missunderstanding you, but the host has cgroups-v1, using a v2 driver would not be correct, right?

Yes, you can see these information using youk info.

What happens if we use the cgroup v2 driver instead of the systemd driver?

How would I do that?

youki info:

Output
$ sudo ./target/x86_64-unknown-linux-musl/debug/youki info
DEBUG youki: started by user 0 with ArgsOs { inner: ["./target/x86_64-unknown-linux-musl/debug/youki", "info"] }
Version           0.3.0
Commit            c7567ab4
Kernel-Release    5.15.0-88-generic
Kernel-Version    #98~20.04.1-Ubuntu SMP Mon Oct 9 16:43:45 UTC 2023
Architecture      x86_64
Operating System  Ubuntu 20.04.6 LTS
Cores             8
Total Memory      7936
Cgroup setup      hybrid
Cgroup mounts
  blkio           /sys/fs/cgroup/blkio
  cpu             /sys/fs/cgroup/cpu,cpuacct
  cpuacct         /sys/fs/cgroup/cpu,cpuacct
  cpuset          /sys/fs/cgroup/cpuset
  devices         /sys/fs/cgroup/devices
  freezer         /sys/fs/cgroup/freezer
  hugetlb         /sys/fs/cgroup/hugetlb
  memory          /sys/fs/cgroup/memory
  net_cls         /sys/fs/cgroup/net_cls,net_prio
  net_prio        /sys/fs/cgroup/net_cls,net_prio
  perf_event      /sys/fs/cgroup/perf_event
  pids            /sys/fs/cgroup/pids
  unified         /sys/fs/cgroup/unified
CGroup v2 controllers
  cpu             detached
  cpuset          detached
  hugetlb         detached
  io              detached
  memory          detached
  pids            detached
  device          attached
Namespaces        enabled
  mount           enabled
  uts             enabled
  ipc             enabled
  user            enabled
  pid             enabled
  network         enabled
  cgroup          enabled
Capabilities
CAP_BPF           available
CAP_PERFMON       available
CAP_CHECKPOINT_RESTORE available

Thanks. It looks like using cgroup v2. In that case, if you enable the systemd feature, you use systemd

CgroupSetup::Unified => {
// ref https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#cgroups-path
if cgroup_path.is_absolute() || !config.systemd_cgroup {
return Ok(create_v2_cgroup_manager(root, cgroup_path)?.any());
}
Ok(
create_systemd_cgroup_manager(root, cgroup_path, config.container_name.as_str())?
.any(),
)
}

Cgroup setup hybrid

So you mean building without the systemd feature, and trying agian?

cross build --features v1,v2 --bin youki

That fails in the same way:

2023-11-16T12:07:36.598531Z DEBUG libcontainer::rootfs::mount: 307: Mounting (emulated) "systemd" cgroup subsystem
2023-11-16T12:07:36.598589Z DEBUG libcontainer::rootfs::mount: 347: Mounting emulated cgroup subsystem: Mount { destination: "/sys/fs/cgroup/systemd", typ: Some("bind"), source: Some("/sys/fs/cgroup/systemd/docker/340bef13dfc80b834d44b1fb8c3efe96a9ec59d553f194cc6a63f351411ad692"), options: Some(["rw", "rbind"]) }
2023-11-16T12:07:36.598656Z DEBUG libcontainer::rootfs::mount: 76: mounting Mount { destination: "/sys/fs/cgroup/systemd", typ: Some("bind"), source: Some("/sys/fs/cgroup/systemd/docker/340bef13dfc80b834d44b1fb8c3efe96a9ec59d553f194cc6a63f351411ad692"), options: Some(["rw", "rbind"]) }
2023-11-16T12:07:36.598809Z ERROR libcontainer::rootfs::mount: 506: failed to canonicalize "/sys/fs/cgroup/systemd/docker/340bef13dfc80b834d44b1fb8c3efe96a9ec59d553f194cc6a63f351411ad692": No such file or directory (os error 2)
2023-11-16T12:07:36.598868Z ERROR libcontainer::rootfs::mount: 127: failed to mount Mount { destination: "/sys/fs/cgroup/systemd", typ: Some("bind"), source: Some("/sys/fs/cgroup/systemd/docker/340bef13dfc80b834d44b1fb8c3efe96a9ec59d553f194cc6a63f351411ad692"), options: Some(["rw", "rbind"]) }: io error
2023-11-16T12:07:36.598935Z ERROR libcontainer::rootfs::mount: 350: failed to mount systemd cgroup hierarchy: io error
2023-11-16T12:07:36.599007Z ERROR libcontainer::rootfs::mount: 90: failed to mount cgroup v1: io error
2023-11-16T12:07:36.599051Z ERROR libcontainer::process::container_init_process: 326: failed to prepare rootfs err=Mount(Io(Os { code: 2, kind: NotFound, message: "No such file or directory" }))
2023-11-16T12:07:36.599114Z ERROR libcontainer::process::container_intermediate_process: 151: failed to initialize container process: failed to prepare rootfs
2023-11-16T12:07:36.599625Z ERROR libcontainer::process::container_main_process: 153: failed to wait for init ready: failed to receive. "waiting for init ready". BrokenChannel
2023-11-16T12:07:36.599728Z ERROR libcontainer::container::builder_impl: 156: failed to run container process err=Channel(ReceiveError { msg: "waiting for init ready", source: BrokenChannel })