Giters
NVIDIA
/
deepops
Tools for building GPU clusters
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
1230
Watchers:
52
Issues:
426
Forks:
316
NVIDIA/deepops Issues
Compatibility with DGX H100
Closed
3 months ago
Comments count
1
Enabling persistent MIG in GPU instances of DGX-A100
Closed
3 months ago
Comments count
2
NIS configuration
Updated
3 months ago
Deepops Slurm NCCL Fail
Closed
5 months ago
Comments count
2
crictl does not respect proxy config
Closed
a year ago
Comments count
3
Error Running ansible-playbook on slurm-cluster: Docker-ce Repository Activation Issue
Closed
6 months ago
Comments count
1
playbook slurm-cluster fails on DGX OS 6 on nvidia-peer-memory task
Closed
6 months ago
Comments count
1
TLS certificate replacement steps are unclear
Closed
6 months ago
Comments count
1
Extend single node K8s DeepOps with additional nodes
Closed
7 months ago
Comments count
1
node exporters don't work after initial run of slurm playbook
Closed
8 months ago
Comments count
5
nodelocaldns forever crashing/restarting [Info/Solution]
Closed
9 months ago
Comments count
2
Update the Network Operator
Closed
9 months ago
Comments count
1
Error: alpine-glibc-shim was not installed
Closed
9 months ago
Comments count
2
K8s dashboard is not viewable by default due to https configuration
Closed
9 months ago
Comments count
1
ERROR! 'include' is not a valid attribute for a Play
Closed
10 months ago
Comments count
2
Building Slurm with Lua
Closed
10 months ago
Comments count
2
NVML version + H100 GPU
Closed
a year ago
Comments count
3
Deos Deepops support NVIDIA driver version 515 or 525?
Closed
a year ago
Comments count
1
Error mounting /home: umount: /home: target is busy
Closed
a year ago
Comments count
2
slurm-master without GPU failed at nvml autodetect
Closed
a year ago
Comments count
2
Slurm build deps on Ubuntu missing libdbus-1-dev
Closed
a year ago
Comments count
2
Docker installation playbook no longer working
Closed
a year ago
nvme Operation not permitted
Closed
a year ago
Comments count
1
Is this proyect alive?
Closed
a year ago
Comments count
3
no token generate with ./scripts/k8s/deploy_dashboard_user.sh
Closed
a year ago
Comments count
3
Deepops upgrade issue v21.06
Closed
a year ago
Comments count
5
[HELP] How can we add all available gpus?
Closed
a year ago
Comments count
1
[ISSUE][deepops, tag: 20.04.2] In CentOS 7.9 x64, msg: 'Not a public key: https://getfedora.org/static/fedora.gpg'
Closed
a year ago
Comments count
2
the role 'kubespray-defaults' was not found
Closed
a year ago
Comments count
2
Uninstalling SLURM
Closed
a year ago
Comments count
7
Issue with K8 Cluster not detecting GPUs
Closed
a year ago
Comments count
2
Is ssh into the Enroot container supposed to be passwordless?
Closed
a year ago
Comments count
1
Ensure docker-ce repository is enabled failed
Closed
a year ago
Ports closed on docker startup
Closed
a year ago
Comments count
4
Uninstall DeepOps and single-node slurm completely
Closed
a year ago
Comments count
1
[Error] When provisioing the k8s cluster, an error occurs when setup.sh running the script. - ImportError: cannot import name 'soft_unicode' from 'markupsafe'
Closed
a year ago
Comments count
3
Error while trying for air gapped environment
Closed
a year ago
Comments count
2
./scripts/k8s/verify_gpu.sh fail
Closed
a year ago
Comments count
3
Bump Network Operator
Closed
a year ago
Comments count
1
GPU is disassociating after running a playbook
Closed
a year ago
Comments count
3
Any plans of OnDemand support for Kubernetes cluster?
Closed
a year ago
Comments count
1
NVIDIA deepops is support GPU Time Slicing ?
Closed
a year ago
Comments count
4
golang install fails
Closed
a year ago
Comments count
4
2 slurm clusters in Deepops
Closed
a year ago
Comments count
3
msg: apt cache update failed
Closed
a year ago
Comments count
3
Galaxy setup failed
Closed
2 years ago
Comments count
1
Cgroup v2 support for SLURM cluster (singuality, grafana, slurm)
Closed
2 years ago
Comments count
1
Implementation Fails on RHEL 7.6 - UndefinedError: 'dict object' has no attribute 'kube_node'
Closed
2 years ago
Comments count
1
URL to be whitelisted for Hybrid cluster deployment
Closed
2 years ago
Comments count
1
SLURM Installation: ansible-playbook picks wrong python interpreter
Closed
2 years ago
Comments count
2
Previous
Next