JiahuiYu / neuralgym

Deep Learning Toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Requirements

AziziShekoofeh opened this issue · comments

Hi,

Thanks for providing the package. I have a question about the requirements. I am trying to train the generative inpainting code [https://github.com/JiahuiYu/generative_inpainting] which is using neuralgym package. In the start of the training procedure [after assigning the GPUs and even generating the graphs], I am getting a few errors related to the version of cuDNN and CUDA library that I am using. However, previously I didn't have any issue in running the test codes of the generative inpainting model.

Ok, long story short, previously I had:
cuDNN v7.1.4
CUDA 9.0
Nvidia Driver 384.130

After the training error, it suggests me to upgrade to cuDNN v.7.2.x, now, with this new driver I am getting:
CUDNN_STATUS_NOT_INITIALIZED

and again, it suggests I need to upgrade my driver. So, I upgrade to Nvidia driver 390.87, but now, it has issues to catch all of the GPUs and saying not enough gpus, well, which I have enough gpu.

My question is that, is there any safe, and tested combination of Tensorflow, Nvidia driver version, CUDA, and cuDNN library version that works well with neuralgym. I really apperitiate even if you can share your versions.

-Many Thanks, Shek

@AziziShekoofeh Can you obtain results with nvidia-smi?

you mean the versions and so on? yes, nvidia-smi is working.

@AziziShekoofeh Ok. I guess the problem is with TensorFlow-GPU. Please have a reference at issues like this one.

Can you successfully train other code/models with tensorflow? I wonder if the problem is on my side or tensorflow/environment side.

Thanks for your prompt responses. Yes, I could.
Anyway, I find the proper combination finally, I am writing it for other people reference,

my previous setting [It was fine during the test but I had an issue with test]:
cuDNN v7.1.4
CUDA 9.0
Nvidia Driver 384.130
TF 1.4

my current setting and upgrades:
cuDNN v7.2.x
CUDA 9.0
Nvidia Driver 396.54
TF 1.4

I think the neuralgym needs cuDNN v7.2.x or higher [I don't know in which part exactly this dependency is happening], and the Nvidia driver should be 396.54 or higher.

Anyway, many thanks for the response and the useful package.

@AziziShekoofeh Thanks and I also appreciate your contribute by sharing your experiences.

In my understand, the package neuralgym does not require any cuDNN related libraries. It implicitly requires cuDNN when import tensorflow.