NVIDIA / deepops

Tools for building GPU clusters

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cgroup v2 support for SLURM cluster (singuality, grafana, slurm)

biocyberman opened this issue · comments

Hi
We have enterprise support for superpod, but I want to write here for direct contact instead of going through the enterprise support portal. During our investigation of the Out-Of-Memory problems for slurm jobs, we discovered that newer singularity version (>3.10) has cgroup v2 support which offers more fine grain control of container's memory usage. I have made changes so that deepops can install singularity version 3.10.2 but there are many other things that need to be done before slurm jobs can be run with this version. Therefore, I would like to request deeops team develop deepops so that it support singularity version >3.10.

This issue is stale because it has been open for 60 days with no activity. Please update the issue or it will be closed in 7 days.