Running on Tensorflow 2.0 + CUDA 10.1

Question

Running on Tensorflow 2.0 + CUDA 10.1

powerspowers opened this issue 4 years ago · comments

Can revert to v1 behavior by doing the following:

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

Does the .pkl package include any tensorflow code? I can't figure out where the get_variable is being referenced since its not in any of the main .py files. Does the .pkl include a _build_func and that is calling .get_variable?

Traceback (most recent call last):
File "tileGAN_server.py", line 1234, in
manager = TFManager()
File "tileGAN_server.py", line 90, in init
self.initDataset(availDatasets[0])
File "tileGAN_server.py", line 115, in initDataset
self.initNetworks()
File "tileGAN_server.py", line 230, in initNetworks
Gs = pickle.load(file)
File "D:\TileGAN\tfutil.py", line 577, in setstate
self._init_graph()
File "D:\TileGAN\tfutil.py", line 482, in _init_graph
out_expr = self._build_func(*self.input_templates, is_template_graph=True, **self.static_kwargs)
File "", line 236, in G_paper
AttributeError: module 'tensorflow' has no attribute 'get_variable'

Anna Frühstück · Answer 1 · Tue Feb 11 2020 16:29:24 GMT+0800 (China Standard Time)

@powerspowers I unfortunately have to agree that debugging the ProGAN framework is quite confusing.
The pkl contains the name of the build function, which is G_paper for the Generator network. This is the actual function that is called here:

File "D:\TileGAN\tfutil.py", line 482, in _init_graph
out_expr = self._build_func(*self.input_templates, is_template_graph=True, **self.static_kwargs)

The Generator network is then initialized.
TF 2.0 seems to no longer support get_variable. You could try to work around this using tf.compat.v1.get_variable, but as TF 2.0 is quite different from TF 1.x, you're likely going to run into more troubles down the line. I'd recommend setting up an environment with TF 1.8 for running TileGAN.

Michael Powers · Answer 2 · Tue Feb 11 2020 16:32:20 GMT+0800 (China Standard Time)

I imagine I could unpickle the file and modify the python to add:

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

I may end up uninstalling TF 2.0 and dropping back to 1.8 in the end but I'm usually stubborn about using the latest versions of things. My idea was to get this all working with the v1 compatibility and then updating the code to follow true 2.0 policies after that.

Anna Frühstück · Answer 3 · Tue Feb 11 2020 16:39:23 GMT+0800 (China Standard Time)

I totally understand the spirit :) Good luck, and I hope you can work it out.
I chose to build on ProGAN, but like you, had a hard time modifying the code to do what I wanted.

I believe TF 2 chose not to maintain compatibility to previous versions in order to create a more homogeneous library, maybe similar to switching to Python 3 from Python 2.x.

Michael Powers · Answer 4 · Tue Feb 11 2020 16:43:53 GMT+0800 (China Standard Time)

Is there a way to know the source code for G_paper? Does it load Tensorflow inside the function or do you think that I can somehow set the context so that tf is tensorflow.compat.tf when G_paper runs?

The rest of the TileGAN code seems to do okay when I disable v2 behavior and set tf to be tensorflow.compat.v1

With the above context then G_paper is free to use get_variable

also, what version of CUDA do you use with TF 1.8, just in case I uninstall and go that route ;)

Anna Frühstück · Answer 5 · Tue Feb 11 2020 16:55:48 GMT+0800 (China Standard Time)

The source code of the build function (G_paper in this case) is stored in the network pkl as net._build_func. I believe this is implemented for archival purposes in order to be able to try and restore many different versions of the networks. In order to "inject" a new build function and/or modified parameters to a loaded network, I created the clone_and_update function, which sets the network parameters such as input and output resolution and finds the build function source based on the specified name.
You may be able to fix your issue by applying the compatibility import before the clone_and_update function in the server is called.

My environment uses CUDA 9.0, cuDNN 7.1.2 and Tensorflow 1.8.

Michael Powers · Answer 6 · Tue Feb 11 2020 17:15:04 GMT+0800 (China Standard Time)

Oh I see, G_paper is defined in networks.py and I assume it gets pickled when the processed progan model is saved. So if I were to create my own processed models following the instructions then my G_paper would be pickled with the v2 functions turned off and v1 compatibility turned on?

Anna Frühstück · Answer 7 · Tue Feb 11 2020 18:27:55 GMT+0800 (China Standard Time)

Yes, if you have your compatibility code within that function block that should work.

Michael Powers · Answer 8 · Wed Feb 12 2020 07:50:30 GMT+0800 (China Standard Time)

Okay, I gave in and reverted to TF 1.5 and gpu. The server runs and now installing for the client although its saying that a DLL load failed. Might have mucked up my Qt install …

After I learn how to convert my own ProGAN models I'll likely try to upgrade again and get it all working under TF 2.0. I have mostly trained my models using BMSGAN which is a pytorch variation of ProGAN. Can I use those models to convert over to TileGAN?

Anna Frühstück · Answer 9 · Wed Feb 12 2020 14:04:07 GMT+0800 (China Standard Time)

I think your best bet would be to download one of my pretrained networks, and look into the exact structure of the data and where and how the weights are stored in the pkl. You may be able to write a converter from that. You'll have to make sure to get all the layer names in the network right, otherwise the network won't be able to initialize from the checkpoint.

Michael Powers · Answer 10 · Wed Feb 12 2020 17:54:35 GMT+0800 (China Standard Time)

Hmmm that could be challenging. I was not able to run Tensorflow ProGAN on my card (1080 8GB) which is why I turned to BMSG-GAN [https://github.com/akanimax/BMSG-GAN]

I have a new RTX 2080 11GB to install that might allow me to use ProGAN.

So when you trained your ProGAN models you took high resolution art images and diced then up into tiles, then fed those tiles to the GAN for training? (Its surprising how few super high resolution artworks are publicly available right?)

Separately I had the thought that using the guide image as an actual seed image for randomizing the textures would be interesting. Using the guide image as a skeleton for the random blobs.

Anna Frühstück · Answer 11 · Wed Feb 12 2020 18:33:20 GMT+0800 (China Standard Time)

I trained some of my models on a single Titan Xp with 12GB, so your new RTX should work!
Yes, the models are trained on high-resolution tiles snipped from some artworks. I had the best successes with paintings from Google Art Project and Wikimedia images. It could also be interesting to accumulate data from different paintings, but I have not tried that.

Michael Powers · Answer 12 · Thu Feb 13 2020 11:56:10 GMT+0800 (China Standard Time)

I have some trained models on Color Field paintings and Abstract Expressionism and modern portraits. Those would be interesting to see what clusters are in there and what your code would do with them.

I plan to install the 2080 this weekend