GPU isolation options
andy108369 opened this issue · comments
We want to make sure one cannot request more AMD GPU than he should by using certain environment variables. (e.g. HIP_VISIBLE_DEVICES
/ ROCR_VISIBLE_DEVICES
).
I am not sure whether this is an issue as of today, we cannot verify this since we don't have a box with more than one AMD GPU at the present time.
To bring more clarity, it is possible to expose access to all NVIDIA GPU on the Host via NVIDIA_VISIBLE_DEVICES=all
env. variable set to the Pod. Luckily, we were able to work it around by setting --set deviceListStrategy=volume-mounts
for nvdp/nvidia-device-plugin
helm chart along with these configs in /etc/nvidia-container-runtime/config.toml
file:
accept-nvidia-visible-devices-as-volume-mounts = true
accept-nvidia-visible-devices-envvar-when-unprivileged = false