davidADSP / GDL_code

The official code repository for examples in the O'Reilly book 'Generative Deep Learning'

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: operands could not be broadcast together with shapes (32,) (7,) (32,)

puigalex opened this issue · comments

I'm getting the following error when finishing the first epoch for 03_05_vae_faces_train notebook. The training throughout the epoch is ok until I need to pass to epoch 2.

ValueError: operands could not be broadcast together with shapes (32,) (7,) (32,)

-Tensorflow-gpu 2

  • Ubuntu 18.04
  • NVIDIA 1050Ti

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
5 , run_folder = RUN_FOLDER
6 , print_every_n_batches = PRINT_EVERY_N_BATCHES
----> 7 , initial_epoch = INITIAL_EPOCH
8 )

~/Documents/Generative/tf2/GDL_code/models/VAE.py in train_with_generator(self, data_flow, epochs, steps_per_epoch, run_folder, print_every_n_batches, initial_epoch, lr_decay)
249 , initial_epoch = initial_epoch
250 , callbacks = callbacks_list
--> 251 , steps_per_epoch=steps_per_epoch
252 )
253

~/anaconda3/envs/generativetf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
64 def _method_wrapper(self, *args, **kwargs):
65 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
---> 66 return method(self, *args, **kwargs)
67
68 # Running inside run_distribute_coordinator already.

~/anaconda3/envs/generativetf2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
853 context.async_wait()
854 logs = tmp_logs # No error, now safe to assign to logs.
--> 855 callbacks.on_train_batch_end(step, logs)
856 epoch_logs = copy.copy(logs)
857

~/anaconda3/envs/generativetf2/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py in on_train_batch_end(self, batch, logs)
388 if self._should_call_train_batch_hooks:
389 logs = self._process_logs(logs)
--> 390 self._call_batch_hook(ModeKeys.TRAIN, 'end', batch, logs=logs)
391
392 def on_test_batch_begin(self, batch, logs=None):

~/anaconda3/envs/generativetf2/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py in _call_batch_hook(self, mode, hook, batch, logs)
296 for callback in self.callbacks:
297 batch_hook = getattr(callback, hook_name)
--> 298 batch_hook(batch, logs)
299 self._delta_ts[hook_name].append(time.time() - t_before_callbacks)
300

~/anaconda3/envs/generativetf2/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py in on_train_batch_end(self, batch, logs)
882
883 def on_train_batch_end(self, batch, logs=None):
--> 884 self._batch_update_progbar(logs)
885
886 def on_test_batch_end(self, batch, logs=None):

~/anaconda3/envs/generativetf2/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py in _batch_update_progbar(self, logs)
926 add_seen = num_steps if self.use_steps else num_steps * batch_size
927 self.seen += add_seen
--> 928 self.progbar.update(self.seen, list(logs.items()), finalize=False)
929
930 def _finalize_progbar(self, logs):

~/anaconda3/envs/generativetf2/lib/python3.6/site-packages/tensorflow/python/keras/utils/generic_utils.py in update(self, current, values, finalize)
570 self._values[k] = [v * value_base, value_base]
571 else:
--> 572 self._values[k][0] += v * value_base
573 self._values[k][1] += value_base
574 else:

ValueError: operands could not be broadcast together with shapes (32,) (7,) (32,)
`

I have the same error when using the tensorflow_2 branch. Did you find a solution?

Thanks

I have the same error when using the tensorflow_2 branch. Did you find a solution?

Thanks

I'll rewrite the code to see whats going on. The temporary solution I found was adding verbose=0 in the fit function under vae.train_with_generator() and turning off the callbacks. It works however you have no clue whether the model is training ok and the epoch you're in.

I had the same issue. The root cause is easy to reproduce. For example,if the batch size is 16 and total training file is 200.
The last batch will only has 8 files so the v also only have 8 losses value. Then it will get the error.

ValueError: operands could not be broadcast together with shapes (16,) (8,) (16,)
tensorflow/python/keras/utils/generic_utils.py
self._values[k][0] += v * value_base

The quick workaround is to make sure the trainning files is a multiple of batch size.

number "7" may means NUM_IMAGES % BATCH_SIZE. so I set a number so that NUM_IMAGES is divisible by BATCH_SIZE, and then this issue seemed to be resolved.

commented

Do you have to delete the images data to match the NUM_IMAGES?