datamachines / cuda_tensorflow_opencv

DockerFile with GPU support for TensorFlow and OpenCV

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem on making tensorflow work with gpu (for 10.2_2.1.0_4.3.0-20200423)

OkenKhuman opened this issue · comments

First, Thanks for helping me out last time.

On working with "datamachines/cudnn_tensorflow_opencv:10.2_2.1.0_4.3.0-20200423" image, I have no problem on enabling cuda support but when I try to use Tensorflow with gpu I have issue of not able to detect my GPU i.e. when I enter "import tensorflow as tf;print(len(tf.config.experimental.list_physical_devices('GPU')))" returns me 0

Is there a way to fix it or do I need to download another image with cuda10.1?

Please help me out

If possible also please mension a way to install darknetpy in any of the image (I think it will be very good enhanceement for ML docker images like this)

Hello Oken, sorry for the lag, I just saw this.
How are you running the container? Are you using docker --gpus=all?
As for Darknet, I see that Yolov4 is out, I was going to update a Dockerfile I had to build it. Maybe I will add it in an example directory when this is done.

Yes I use the docker --gpu=all command prefix. OpenCV's DNN (GPU) and other GPU backend packages like CuPy works well.
Only the Tensorflow is not able to detect / use my GPU

tl;dr: still looking into it

long: I am still investigating the main issue but TF requires CuDNN to work, so the "cuda" version will have to be CPU-bound. As I was looking into it, it appears the pip installed version is bound to older version of CUDA (10.0) and is hard-linked to those libraries, so I added some workarounds in the develop-linux branch as well as some tests (in test to run some simple TF code on CPU and GPU).

Have some preliminary content in the develop-linux branch that now builds TF from source.
TF needs the DNN base to compile the GPU dependent part.

Yesterday I was able to fully download and use your "datamachines/cudnn_tensorflow_opencv:10.1_2.1.0_4.3.0" build. Hopefully the TF there works with GPU :-).
Thanks again for this wonderful image it is very helpful for engineering student like me.
And if you have any paper based on this I want to cite it on project I am working or is it ok if I just give reference to this?

Hi Oken,

I am currently building the "20200615" release, which will have TF built from source and will make use of the local CUDA and CuDNN. I would recommend waiting a couple more days (moved the compilation to a system with a lot more cores, and it is still taking a long time) before trying with this version.

If you can not wait for this release, I would encourage you to check out the develop-linux branch and compile the one version that will work best for you. On my gaming laptop (what I was using before compiling TF as well), it takes 4-5 hours per build.

Another option is to run this script to load the CUDA 10.0 libraries for TF to use them but this was more a workaround than an update solution; see:
e6d8d0c

In the test directory, you will see a few python scripts starting with tf_, I would test those in the running container to see what the system sees.

Related to reference, feel free to reference the GitHub.

We also published an article that introduced this abstraction:
"Enabling GPU-Enhanced Computer Vision and Machine Learning Research Using Containers" (Dec 2019) High Performance Computing - ISC High Performance 2019 International Workshops, Lecture Notes in Computer Science Volume 11887
https://link.springer.com/chapter/10.1007/978-3-030-34356-9_8

I have committed to the linux-develop branch a re-factorization of the Dockerfile which has so far successfully built all the cudnn- variants. I am waiting for all of them to be compiled before calling it a success and pushing the images as well.

Confirming the 20200615 release will solve this (being pushed to DockerHub currently)
Note that you will want to use the cudnn- variant to get GPU access.
Run test/tf_hw.py to obtain the list of functional hardware; when you see the verbose load for CUDA components, you will be given details on your GPU hardware, confirming it is present.

Closing this issue at this point.

20200615 is now released and pre-built images available on Dockerhub.

Off-topic, but just wanted to thank you for your hard work on what is contained in this repo. I've wasted way too much time over the last few years getting TF, OpenCV, CUDA to play nicely together, and this repo means myself and others hopefully need to spend far less time doing so. So thankyou!

You are quite welcome, I use this container very often for the same reason: I need a ready set of tools just to get some OpenCV code functional, and hopefully soon will extend the JetsonNano one for doing analytics at the edge :)

darknetpy would unfortunately not be a good solution for using with CTO, it tries to compile Yolo.

But PyYolo (https://github.com/goktug97/PyYOLO) uses the already installed OpenCV and libdarknet.so, so I have confirmed that it works by using their sample.py code; see https://github.com/datamachines/cuda_tensorflow_opencv#641-using-pyyolo