CUPTI_ERROR_INSUFFICIENT_PRIVILEGES in Docker
johnbensnyder opened this issue · comments
GPU profiling in Docker requires including the docker run option '--privileged=true'.
Topic is discussed in this issue:
Can Docker setup instructions be included on the profiler setup page?
Good to hear, @ckluk, thank you! If you're open to taking requests, I'd be very interested in a Docker setup in which:
a) GPU profiling works
b) the container is run as a normal user (so that all newly created files, eg logs and saved models, are owned by the user, not root)
but I can't get both to work at the same time. I have the following in (the host machine's) /etc/modprobe.d/nvidia-kernel-common.conf
:
options nvidia "NVreg_RestrictProfilingToAdminUsers=0"
and I ran
update-initramfs -u
after adding it (and rebooted afterwards).
The Docker container is created by
docker run -it --gpus=all --rm --user "$(id -u):$(id -g)" dom/tensorflow:2.2.0-gpu
(plus some volume binds etc). Unfortunately, this setup leads to CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
.
Thanks @ckluk. I was hoping it's a matter of bad setup, but it's good to hear it'll at least eventually get resolved.
@d-miketa Instead of running the container with --privileged=true
, try --cap-add=CAP_SYS_ADMIN
More info: https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti
I ended up doing the following, some subset of which seems to have done the trick:
- updating host machine to Ubuntu 20.04
- adding
options nvidia "NVreg_RestrictProfilingToAdminUsers=0"
to/etc/modprobe.d/nvidia-kernel-common.conf
and runningupdate-initramfs -u
- adding
export CUDA_VERSION="10.1"
,export LD_LIBRARY_PATH="/usr/local/cuda-${CUDA_VERSION}/lib64:/usr/local/cuda-${CUDA_VERSION}/extras/CUPTI/lib64
andexport LD_INCLUDE_PATH="/usr/local/cuda-${CUDA_VERSION}/include:/usr/local/cuda-${CUDA_VERSION}/extras/CUPTI/include"
to the host machine's.zshrc
- adding
ENV LD_INCLUDE_PATH="/usr/local/cuda/include:/usr/local/cuda/extras/CUPTI/include:$LD_INCLUDE_PATH
to the Dockerfile - running the Docker container with
--privileged
It's possible that --cap-add=CAP_SYS_ADMIN
would work as well as --privileged
, but I haven't tried.
I ended up doing the following, some subset of which seems to have done the trick:
- updating host machine to Ubuntu 20.04
- adding
options nvidia "NVreg_RestrictProfilingToAdminUsers=0"
to/etc/modprobe.d/nvidia-kernel-common.conf
and runningupdate-initramfs -u
- adding
export CUDA_VERSION="10.1"
,export LD_LIBRARY_PATH="/usr/local/cuda-${CUDA_VERSION}/lib64:/usr/local/cuda-${CUDA_VERSION}/extras/CUPTI/lib64
andexport LD_INCLUDE_PATH="/usr/local/cuda-${CUDA_VERSION}/include:/usr/local/cuda-${CUDA_VERSION}/extras/CUPTI/include"
to the host machine's.zshrc
- adding
ENV LD_INCLUDE_PATH="/usr/local/cuda/include:/usr/local/cuda/extras/CUPTI/include:$LD_INCLUDE_PATH
to the Dockerfile- running the Docker container with
--privileged
It's possible that
--cap-add=CAP_SYS_ADMIN
would work as well as--privileged
, but I haven't tried.
Hi! how to pass those parameters into Docker container?
I did as follows but got error
nvidia-docker run -d -it --name retina_net -v /home/readib/Experiments/:/ -p 8000:8888 -v /tmp/.X11-unix/:/tmp/.X11-unix -e DISPLAY=$DISPLAY retina_net:latest --cap-add=CAP_SYS_ADMIN /bin/bash
Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "--cap-add=CAP_SYS_ADMIN": executable file not found in $PATH: unknown
Thank you.
In order to run docker:
nvidia-docker run '--privileged=true' -d -it --name retina_net -v /home/readib/Experiments/:/home -p 8000:8888 -v /tmp/.X11-unix/:/tmp/.X11-unix -e DISPLAY=$DISPLAY retina_net:latest /bin/bash