containers / youki

A container runtime written in Rust

Home Page:https://containers.github.io/youki/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unexpected `chdir` invoked on container `init` and `start`

Mossaka opened this issue · comments

While investigating a performance issue, I observed that the working directory /run/containerd/io.containerd.runtime.v2.task/<namespace>/<containerid>/ becomes inaccessible or gets deleted after executing the shim::wait() call in the runwasi shim process. This deletion prevents the shim process from reading the address file to delete the shim socket. (e.g. ref code)

Logs

I ran bpftrace on unlink and unlinkat syscalls on that paths and found that youki inner process unlinks the bundle path before containerd calls delete-shim (before process 2611761 gets started).

Process started: /usr/local/bin/containerd-shim-wasmtime-v1 PID: 2611672
Process started: /usr/local/bin/containerd-shim-wasmtime-v1 PID: 2611681
PID 2611707 (youki:[2:INIT]): File unlink in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 2611681 (client_handler): File unlinkat in target directory: /run/containerd/wasmtime/default/testwasm
Process started: /usr/local/bin/containerd-shim-wasmtime-v1 PID: 2611761
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/.testwasm..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/.testwasm..
PID 569984 (containerd): File unlinkat in target directory: .testwasm
PID 569984 (containerd): File unlinkat in target directory: .testwasm
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 2611658 (ctr): File unlinkat in target directory: testwasm-stderr
PID 2611658 (ctr): File unlinkat in target directory: testwasm-stdout
PID 2611658 (ctr): File unlinkat in target directory: testwasm-stdin

Specifically, this caught my attention: PID 2611707 (youki:[2:INIT]): File unlink in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..

Question:

I am raising this issue to try to understand why youki does that. This might be the reason why the shim process is not able to delete the socket address after the ttrpc server shuts down.

FYI: @utam0k @jprendes

I want to make sure something before the investigation:
a. Don't the executor and post hook you passed to libcontainer call unlink?
b. Is there the smallest step to reproduce this?
c. May I ask you to give us before and after syscalls to help us understand?

I will try to reproduce this in youki, getting back to you later.

Okay I spent more time tracing where the root cause is, and found that after handling the Create request in runwasi, the shim process's current directory has been set to the container root_directory (e.g. /run/youki/<ns>/<id>) by youki at here. And after the Delete request, youki cleans up the container path, and so the shim process doesn't have a current directory anymore.

Question: why does youki chdir to the container directory at init and container_start?

@Furisto Hi, Thomas. I'd like to know about your comment https://github.com/containers/youki/pull/143/files#r673503679. Is this assuming that console_socket was a relative path?

Is there a problem here?

let csocketfd = match socket::connect(
csocketfd.as_raw_fd(),
&socket::UnixAddr::new(socket_name).map_err(|err| TTYError::InvalidSocketName {
source: err,
socket_name: socket_name.to_string(),
})?,

Sorry, but I've created another PR to fix it.
#2780

Hey @Mossaka , The related PR will release soon, but I had a question with this issue -
You mentioned that

... youki at here. And after the Delete request, youki cleans up the container path, and so the shim process doesn't have a current directory anymore.
Question: why does youki chdir to the container directory at init and container_start?

I'm not sure why youki setting cwd in the container init process would have any issues with shim? The start, run and delete youki processes (created using youki create , youki start and youki delete resp.) would run independent of each other, so probably the only potential cause of the removed dir would be in the delete process right? What would be the issue with start and run processes (and by extension init )?