On some systems, mounting cgroup v1 by `setup_emulated_subsystem` may missing some subsystems
JCKeep opened this issue · comments
I'm writing an oci runtime using some youki's source code by learning, and find some bugs.
On my system, /proc/self/cgroup
file be like:
# cat /proc/self/cgroup
11:perf_event:/
10:freezer:/
9:memory:/
8:pids:/
7:devices:/
6:cpuacct,cpu:/
5:blkio:/
4:cpuset:/
3:hugetlb:/
2:net_prio,net_cls:/
1:name=systemd:/user.slice/user-0.slice/session-264.scope
the cpuacct,cpu
and net_prio,net_cls
was unsorted, so in function mount_cgroup_v1
:
let process_cgroups: HashMap<String, String> = Process::myself()?
.cgroups()?
.into_iter()
.map(|c| {
let hierarchy = c.hierarchy;
// When youki itself is running inside a container, the cgroup path
// will include the path of pid-1, which needs to be stripped before
// mounting.
let root_pathname = root_cgroups
.iter()
.find(|c| c.hierarchy == hierarchy)
.map(|c| c.pathname.as_ref())
.unwrap_or("");
let path = c
.pathname
.strip_prefix(root_pathname)
.unwrap_or(&c.pathname);
(c.controllers.join(","), path.to_owned())
})
.collect();
process_cgroups
will record cpuacct,cpu
and net_prio,net_cls
, but system comount in name cpu,cpuacct
and net_cls,net_prio
, so when using setup_emulated_subsystem
it find no subsystem to bind.
// in function setup_emulated_subsystem, it will search `cpu,cpuacct` and `net_cls,net_prio`,
// but process_cgroups store `cpuacct,cpu` and `net_prio,net_cls`
if let Some(proc_path) = process_cgroups.get(named_hierarchy.as_ref()) {
So when I test, it will not mount some cgroupv1 subsystems and not generate error.
I fixed it on my system like this, mybe this will work for other system?
let process_cgroups: HashMap<String, String> = Process::myself()?
.cgroups()?
.into_iter()
.map(|c| {
let hierarchy = c.hierarchy;
// When youki itself is running inside a container, the cgroup path
// will include the path of pid-1, which needs to be stripped before
// mounting.
let root_pathname = root_cgroups
.iter()
.find(|c| c.hierarchy == hierarchy)
.map(|c| c.pathname.as_ref())
.unwrap_or("");
let path = c
.pathname
.strip_prefix(root_pathname)
.unwrap_or(&c.pathname);
// some system the controllers may unsorted
(c.controllers.iter().sorted().join(","), path.to_owned())
})
.collect();
Thanks for your report, @JCKeep
the cpuacct,cpu and net_prio,net_cls was unsorted,
What does this mean? May I ask you to give me a more specific example or something?
@utam0k youki scan /proc/self/cgroup
to collect the cgroups, when call setup_emulated_subsystem
to bind mount cgroup subsystems, it will result in errors but not generate error.
for example, it may not mount net_cls,net_prio
subsystem, because my /proc/self/cgroup
is
[root@iZbp16aq5lgolei74n44yqZ ~]# cat /proc/self/cgroup
11:perf_event:/
10:freezer:/
9:memory:/
8:pids:/
7:devices:/
6:cpuacct,cpu:/
5:blkio:/
4:cpuset:/
3:hugetlb:/
2:net_prio,net_cls:/
1:name=systemd:/user.slice/user-0.slice/session-558.scope
in setup_emulated_subsystem
, it will search process_cgroups
by name "net_cls,net_prio", but it store "net_prio,net_cls"
#[cfg(feature = "v1")]
fn setup_emulated_subsystem(
&self,
cgroup_mount: &SpecMount,
options: &MountOptions,
subsystem_name: &str,
named: bool,
host_mount: &Path,
process_cgroups: &HashMap<String, String>,
) -> Result<()> {
tracing::debug!("Mounting (emulated) {:?} cgroup subsystem", subsystem_name);
let named_hierarchy: Cow<str> = if named {
format!("name={subsystem_name}").into()
} else {
subsystem_name.into()
};
// we search "net_cls,net_prio", but it store "net_prio,net_cls"
if let Some(proc_path) = process_cgroups.get(named_hierarchy.as_ref()) {
....
} else {
tracing::warn!("Could not mount {:?} cgroup subsystem", subsystem_name);
}
I understood this problem. I have a question. How does runc handle this case? Do you know?
@JCKeep We should fix it, but I wonder how runc treats it.
@utam0k runc use the mount options of /proc/self/mountinfo
to get the cgroup subsystem mounts.
On my system, /proc/self/mountinfo
like:
the subsystem listed in /proc/self/cgroup
will match cgroup mount options.
@JCKeep Which code?
@utam0k you may have a look at function mountCgroupV1 and getCgroupMountsHelper
Why don't you align to runc's implementation?