datamachines / cuda_tensorflow_opencv

DockerFile with GPU support for TensorFlow and OpenCV

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cudnn_tensorflow_opencv docker image does not detect GPU when running opencv dnn module

sulebaynes opened this issue · comments

Hi, I need opencv's dnn module to construct a machine learning model and do a single forward pass. When I run the python code without cudnn_tensorflow_opencv docker image, the video I am processing takes about 45 seconds. When I use cudnn_tensorflow_opencv docker image it takes the same time. I can monitor through nvdia-smi the GPU activity. It is not used. Also opencv or anything else do not print any info about any found GPU. I use docker version > 20, I also used the flag --gpus all just in case. Nothing happens. How can I use the GPU with opencv's dnn module?

I am unsure of the exact code in OpenCV that you are using, but are you requesting OpenCV to use the CUDA Backend?

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

There is also

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16)

Otherwise, which tag are you using for cudnn_tensorflow_opencv? One thing to check is if the OpenCV build is available for your architecture (in the build log)?

Thank you very much for your quick reply and solving my issue.
After setting my net and letting opencv to know what to use as backend with the following lines you sent

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

I started to get a more promising error:

CUDA driver version is insufficient for CUDA runtime version in function 'ManagedPtr'

So far I have tried the tags 11.2.0_2.4.1_4.5.1-20210211 and 10.2_2.4.1_4.5.1-20210211 but both resulted in the above error. I understand that it is somewhat related to version issues now. I will try other tags of yours. Here is my nvidia-smi output, if you happen to know what I should do next or what tag I should use:

gpu

460.32.03 is not really old, it was released in January and is listed as supporting 11.2.0 according to Table 2 of https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html so I am not entirely sure.

Would you be able and willing to try to build your own version of the container to check if that fixes your problem?
I understand it kind of defeats the purpose of being able to pull from Docker hub, but I wonder if I built it on a different version and that is what is causing this issue?

To do so, simply grab the latest release from https://github.com/datamachines/cuda_tensorflow_opencv/releases/tag/20210211 and with docker and build-essential installed (I think that ought to be enough) you should be able to run: make cudnn_tensorflow_opencv-11.2.0_2.4.1_4.5.1
The process is going to take time (and be very CPU heavy)

It is likely not a driver version number, you have a Tesla K80, correct?

According to https://en.wikipedia.org/wiki/CUDA#GPUs_supported the K80 is a "compute capability (version)" of 3.7.
The minimum container "built" is starting at 6.0, see

# CUDNN needs 5.3 at minimum, extending list from https://en.wikipedia.org/wiki/CUDA#GPUs_supported
# Skipping Tegra, Jetson, ... (ie not desktop/server GPUs) from this list
# Keeping from Pascal and above
# Also only installing cudnn7 for 18.04 based systems
DNN_ARCH_CUDA9=6.0,6.1,7.0
DNN_ARCH_CUDA10=6.0,6.1,7.0,7.5
DNN_ARCH_CUDA11=6.0,6.1,7.0,7.5,8.0,8.6

So none of the builds we generated support that architecture. You will have to build your own (as described earlier); I am unclear if DNN will be supported because of the comment I added at the time that 5.3 is the minimum for CuDNN.

If you are willing to try, modify the Makefile 's DNN_ARCH_CUDA variables to add 3.7 (for example DNN_ARCH_CUDA11=3.7,6.0,6.1,7.0,7.5,8.0,8.6 ) and try a make for the build you need.

Based on those notes, I added content to the main README.md: on supported GPU, adding a GPU to a build, and using OpenCV DNN backend

I also tried something, I did build a one-off container image with 3.7 enabled.
I will note that I got this warning as I did: nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
If you are willing to try, it is publicly available as martialdmc/cudnn_tensorflow_opencv:11.2.0_2.4.1_4.5.1-20210211
(notice the change in the source to martialdmc)

Let me know if this helped.

@sulebaynes any resolution on your end?
I will likely close this by the end of the week.

Hi Martial, very sorry for taking so long. As soon as you pointed out that the dnn module of opencv requires a GPU with a cc minimum of 5.3, we had to quickly move on to using TensorFlow and Keras frames instead of opencv for inferences for our AI models. I had the time this morning to try out the Docker image you shared and checked if the model runs on the GPU, however I received the following error:
[ WARN:0] global /tmp/pip-req-build-ms668fyv/opencv/modules/dnn/src/dnn.cpp (1442) setUpNet DNN module was not built with CUDA backend; switching to CPU
Thanks so much for your effort and giving a try yourself to solve my very specific problem; I am very grateful.
I think as long as the company I am working for goes ahead with Tesla K80s, opencv's cuDNN will not be an option for us.

Thank you for following up, this is unfortunate.
I was hoping that OpenCV would work given that it only listed them as deprecated (not unavailable) in the build.
Good luck with this effort.