containers / conmon-rs

An OCI container runtime monitor written in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Idea: Would it be possible to create the cgroup in advance?

utam0k opened this issue · comments

This is just an idea. I'd like to discuss this with the conmon-rs team.

Maybe you know creating a cgroup takes a cost, actually, it is one of the most time-consuming tasks. in the container runtime following OCI runtime spec.
Youki previously considered creating the cgroup asynchronously with io_uring, but this did not yield very good results.
However, if it is a daemon like a server, there should be enough time to create it in advance. Whereas there, the container runtime should be able to skip the cgroup creation process by creating the process with clone3. Wdyt?

This idea is inspired from:

Thus reducing the amount of exec calls that must happen in the container engine, and reducing the amount of memory it uses.

Hey @utam0k, thank you for reaching out!

I'm wondering if cgroup creation should be really a concern of conmon-rs, on the other hand we're also thinking about moving parts of the namespace handling into it. How would the interface between (let's say) youki and conmon-rs look like?

Wouldn't it be possible to use clone3 directly within the runtime in the same way as crun does it?
containers/crun#1042

What are your thoughts on that @giuseppe @haircommander ?

Thanks for your reaction 🙏

I'm still too new to this project to know that, so please close if this is out of this project's interest.

I'm wondering if cgroup creation should be really a concern of conmon-rs, on the other hand we're also thinking about moving parts of the namespace handling into it. How would the interface between (let's say) youki and conmon-rs look like?

Of course, we can implement it. Sorry if my understanding is different. That assumes that the cgroup subgroup to which the container process has to belong to is created from the caller of OCI Runtime beforehand, right?

Wouldn't it be possible to use clone3 directly within the runtime in the same way as crun does it?
containers/crun#1042

this did not yield very good results

what went wrong with this?

the other hand we're also thinking about moving parts of the namespace handling into it

this is true, though the motivation is slightly different. conmon-rs will be taking the responsibility of creating pod-level namespaces. however, in kubernetes world, cri-o is not responsible for creating the pod level cgroup (the kubelet is)

@haircommander
Here is the detail. In summary, at that time there was no way to create the directory with io_uring 😭
containers/youki#327

what went wrong with this?

Oh, really? I didn't know about it. Thanks for telling me about it. In other words, when an oci container runtime is used by kubelet, it doesn't need to create a cgroup dir by itself, right?

the kubelet is

sort of. Kubelet uses libcontainer to create the pod level cgroup. the oci runtime doesn't as much care about pod cgroups, as it only focuses on the container cgroup. if the pod cgroup is not created, it is created by the oci runtime. otherwise it's treated as any other cgroup (like putting the container in system.slice)

I don't think that is really possible when using the systemd cgroup manager. systemd itself will take care to create the cgroup and for doing that, systemd first needs to know the PID to move to the new cgroup so that it is not possible to create an empty cgroup