Error for running experiments

Question

Error for running experiments

yunhunJang opened this issue 8 years ago · comments

I'm trying to running the experiments with MNIST.

I used the command PYTHONPATH='.' python launchers/run_mnist_exp.py

However, it gives me the error

ValueError: Variable d_net/conv_batch_norm/conv_batch_norm/conv_batch_norm_2/conv_batch_norm/conv_batch_norm/moments/normalize/mean/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

It might be caused by the fact that I use recent version of Tensorflow master branch. I wonder which modification make running this code on current settings.

Thanks,

SUPER-MARIO · Answer 1 · Tue Dec 06 2016 14:31:28 GMT+0800 (China Standard Time)

Same issue, hope for help.

Alex Coventry · Answer 2 · Thu Dec 08 2016 07:11:15 GMT+0800 (China Standard Time)

Try setting

tensorflow==0.9.0
prettytensor==0.6.2

in requirements.txt. (Versions inferred from the chronology of the git histories.) Make sure you're also using the tensorflow/tensorflow:0.9.0-gpu image with nvidia-docker.

pmiller10 · Answer 3 · Fri Dec 09 2016 05:19:22 GMT+0800 (China Standard Time)

I hit the same issue. I switched to prettytensor==0.6.2 but still used tensorflow==0.12.0 and that seemed to solve it.

Yunhun Jang · Answer 4 · Fri Dec 09 2016 09:42:11 GMT+0800 (China Standard Time)

@pmiller10 I remove previous prettytensor==0.7.1 and re-install prettytensor==0.6.2 but it still does not work.. Did you run it with docker? Could you describe your settings in detail?

pmiller10 · Answer 5 · Tue Dec 13 2016 07:55:06 GMT+0800 (China Standard Time)

@yunhunJang My steps are:

git clone git@github.com:openai/InfoGAN.git
sudo docker run -v $(pwd)/InfoGAN:/InfoGAN -w /InfoGAN -it -p 8888:8888 gcr.io/tensorflow/tensorflow:r0.9rc0-devel
confirm which versions of prettytensor and tensorflow you have:
pip freeze | grep 'tensor'
At this point, all I have is tensorflow==0.9.0rc0.
edit requirements.txt: change prettytensor -> prettytensor==0.6.2
pip install -r requirements.txt. After this, check again what versions you have. This is what I've got:

prettytensor==0.6.2
tensorflow==0.9.0rc0

PYTHONPATH='.' python launchers/run_mnist_exp.py

Yunhun Jang · Answer 6 · Thu Dec 15 2016 16:25:43 GMT+0800 (China Standard Time)

So, you used tensorflow 0.9, right?
If I changed it to v0.9 it works well.
But still, I wonder how can I make it work in tensorflow v0.12 (and using prettytensor 0.7.2 which is the latest version)

It gives error in ExponentialMovingAverage operation in conv_batch_norm in custom_op.py.
The error message is following:

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Extracting MNIST/train-images-idx3-ubyte.gz
Extracting MNIST/train-labels-idx1-ubyte.gz
Extracting MNIST/t10k-images-idx3-ubyte.gz
Extracting MNIST/t10k-labels-idx1-ubyte.gz
batch_norm
g_net/fc_batch_norm
g_net/fc_batch_norm/batch_norm
batch_norm
g_net/fc_batch_norm_1
g_net/fc_batch_norm_1/batch_norm
conv_batch_norm
g_net/conv_batch_norm
g_net/conv_batch_norm/conv_batch_norm
custom_conv2d
d_net/custom_conv2d
d_net/custom_conv2d/custom_conv2d
custom_conv2d_1
d_net/custom_conv2d_1
d_net/custom_conv2d_1/custom_conv2d_1
conv_batch_norm
d_net/conv_batch_norm
d_net/conv_batch_norm/conv_batch_norm
batch_norm
d_net/fc_batch_norm
d_net/fc_batch_norm/batch_norm
custom_conv2d
d_net/custom_conv2d
d_net/custom_conv2d/custom_conv2d
custom_conv2d_1
d_net/custom_conv2d_1
d_net/custom_conv2d_1/custom_conv2d_1
conv_batch_norm
d_net/conv_batch_norm
d_net/conv_batch_norm/conv_batch_norm
Traceback (most recent call last):
  File "launchers/run_mnist_exp.py", line 65, in <module>
    algo.train()
  File "/home/yhoon/InfoGAN/infogan/algos/infogan_trainer.py", line 210, in train
    self.init_opt()
  File "/home/yhoon/InfoGAN/infogan/algos/infogan_trainer.py", line 53, in init_opt
    real_d, _, _, _ = self.model.discriminate(input_tensor)
  File "/home/yhoon/InfoGAN/infogan/models/regularized_gan.py", line 72, in discriminate
    reg_dist_flat = self.encoder_template.construct(input=x_var)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1246, in construct
    return self._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1173, in _construct
    result = self._method(*method_args, **method_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/scopes.py", line 158, in __call__
    return self._call_func(args, kwargs)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/scopes.py", line 131, in _call_func
    return self._func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1922, in _with_method_complete
    return input_layer._method_complete(func(*args, **kwargs))
  File "/home/yhoon/InfoGAN/infogan/misc/custom_ops.py", line 27, in __call__
    self.ema_apply_op = self.ema.apply([self.mean, self.variance])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 391, in apply
    self._averages[var], var, decay, zero_debias=zero_debias))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
    update_delta = _zero_debias(variable, value, decay)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 177, in _zero_debias
    trainable=False)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
    caching_device=caching_device, validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 650, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable d_net/conv_batch_norm/conv_batch_norm/conv_batch_norm_2/conv_batch_norm/conv_batch_norm/conv_batch_norm_2/conv_batch_norm/conv_batch_norm/moments/normalize/mean/ExponentialMovingAverage/biased does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

originally defined at:
  File "launchers/run_mnist_exp.py", line 48, in <module>
    network_type="mnist",
  File "/home/yhoon/InfoGAN/infogan/models/regularized_gan.py", line 37, in __init__
    custom_conv2d(128, k_h=4, k_w=4).

I added some print in custom_conv2d and conv_batch_norm like following..

...
    def __call__(self, input_layer, output_dim,
                 k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02, in_dim=None, padding='SAME',
                 name="conv2d"):
        print(name)
        print(tf.get_variable_scope().name)
        with tf.variable_scope(name):
            print(tf.get_variable_scope().name)
...

...
shp = in_dim or shape[-1]
        print(name)
        print(tf.get_variable_scope().name)
        with tf.variable_scope(name) as scope:
            print(tf.get_variable_scope().name)
            self.gamma = self.variable("gamma", [shp], init=tf.random_normal_initializer(1., 0.02))
...

Any hints would be nice. I'm new to tensorflow, so it is hard to get where to look at now.
( I tested with my simple NN using similar flow in tensorflow 0.12 and prettytensor 0.7.2, and it works okay. I think this custom batch_norm/conv makes some conflict with the latest version of tensorflow/prettytensor)

Thanks!

Tudor Achim · Answer 7 · Fri Dec 16 2016 05:34:07 GMT+0800 (China Standard Time)

This is due to tensorflow fixing a problem with EMA -- see VittalP/UnsupGAN#1 for a fix.

Yunhun Jang · Answer 8 · Fri Dec 16 2016 12:25:52 GMT+0800 (China Standard Time)

@tachim Thank you! It works well now! I really appreciate it.

lyhangustc · Answer 9 · Mon Dec 26 2016 23:26:20 GMT+0800 (China Standard Time)

@yunhunJang I have read VittalP/UnsupGAN#1. But I do not know how to edit the code. What did you edit to get it work?

Yunhun Jang · Answer 10 · Mon Dec 26 2016 23:31:28 GMT+0800 (China Standard Time)

@lyhangustc I edit the line16 of infogan/misc/custom_ops.py from with tf.variable_scope(name) as scope: to with tf.variable_scope(tf.get_variable_scope(), reuse=False) as scope:

lyhangustc · Answer 11 · Mon Dec 26 2016 23:39:53 GMT+0800 (China Standard Time)

@yunhunJang It works. Thank you!

Kaihu Chen · Answer 12 · Fri Dec 30 2016 02:22:18 GMT+0800 (China Standard Time)

I also have problem running the MNIST experiment, but the symptom looks different from the above:

$ PYTHONPATH='.' python launchers/run_mnist_exp.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.4.0.7 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.7.5 locally
Extracting MNIST/train-images-idx3-ubyte.gz
Extracting MNIST/train-labels-idx1-ubyte.gz
Extracting MNIST/t10k-images-idx3-ubyte.gz
Extracting MNIST/t10k-labels-idx1-ubyte.gz
--Return--
None
> /mnt/ml/tests/InfoGAN/infogan/misc/custom_ops.py(121)__call__()
    117                                        init=tf.random_normal_initializer(stddev=stddev))
    118                 bias = self.variable("bias", [output_size], init=tf.constant_initializer(bias_start))
    119                 return input_layer.with_tensor(tf.matmul(input_, matrix) + bias, parameters=self.vars)
    120         except Exception:
--> 121             import ipdb; ipdb.set_trace()

ipdb>  init=tf.random_normal_initializer(stddev=stddev)
ipdb> init
<function _initializer at 0x7f7228067c08>
ipdb> stddev
0.02
ipdb>

Anybody can help? Thanks!

Yeu-Chern Harn · Answer 13 · Thu Jan 12 2017 06:43:05 GMT+0800 (China Standard Time)

@kaihuchen I came with the same problem with you. I solve this error by 1) update tensorflow to version 0.12.1, 2) update the code by the solution provided by @yunhunJang.

Kaihu Chen · Answer 14 · Thu Jan 12 2017 13:06:51 GMT+0800 (China Standard Time)

@frizfealer Got it. Thanks!

Wei Wu · Answer 15 · Wed May 31 2017 11:15:07 GMT+0800 (China Standard Time)

solved this problem with suggest form @pmiller10 :
pip uninstall prettytensor
pip install prettytensor==0.6.2

Sumit Dugar · Answer 16 · Wed May 31 2017 16:23:34 GMT+0800 (China Standard Time)

The fix that is being discussed above worked for tensorflow 1.0.1 but after I upgraded tensorflow to 1.2 I got the same error again. I tried a few version in between 1.2 and 1.0.1 but was still getting the same error

Wei Wu · Answer 17 · Wed May 31 2017 17:28:47 GMT+0800 (China Standard Time)

just use tf version r0.9rc0-devel from readme: https://github.com/openai/InfoGAN#running-in-docker

Zak Jost · Answer 18 · Mon Sep 25 2017 09:59:46 GMT+0800 (China Standard Time)

I forked this repo and made the changes to use tensorflow 1.3.0. You can find it here. But this diff shows the changes.

I also needed to change a part of prettytensor that involved unpacking the trace. Specifically, I modified prettytensor/pretty_tensor_class.py to add a try/except block and then instead of unpacking the trace by assuming 4 elements, I just assign the "f", "line_no", and "method" by indexing a tuple so that len(result._traceback) != 4 doesn't break it.

1337   try:
1338       for traceback in result._traceback:
1339         f = traceback[0]
1340         line_no = traceback[1]
1341         method = traceback[2]
1342         if (method in ('_replace_deferred', '_construct') and
1343             f.endswith('pretty_tensor_class.py')):
1344           found = True
1345           continue
1346         trace.append((f, line_no, method, {}))
1347       result._traceback = trace
1348   except:
1349       print("Traceback: ", result._traceback)