davidADSP / GDL_code

The official code repository for examples in the O'Reilly book 'Generative Deep Learning'

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

03_05_vae_faces_train Error durinig training

takeofuture opened this issue · comments

I try to run this on Google Colab (TF 2.3) (Branch=tensorflow_2)
I downloaded the image from https://www.kaggle.com/jessicali9530/celeba-dataset
and only placed about 1000 jpg files and changed epoch to 10 from 200.
The image is 178 x 218 Pixcls

I got the following error,

Epoch 1/10
16/31 [==============>...............] - ETA: 1:49 - loss: 938.0780 - reconstruction_loss: 937.9746 - kl_loss: 0.1033

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-12-6b00e1cc66a8> in <module>()
      5     , run_folder = RUN_FOLDER
      6     , print_every_n_batches = PRINT_EVERY_N_BATCHES
----> 7     , initial_epoch = INITIAL_EPOCH
      8 )

9 frames

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/generic_utils.py in update(self, current, values, finalize)
    556           self._values[k] = [v * value_base, value_base]
    557         else:
--> 558           self._values[k][0] += v * value_base
    559           self._values[k][1] += value_base
    560       else:

ValueError: operands could not be broadcast together with shapes (32,) (10,) (32,) 

I am not sure if this is bug of util code, does anybody try this sucessfuly ?
(I am wondering if the number of image is too little or something else)

As far as I found out, this is connected to the fact that the number of input images is not an integer multiple of the batch size.


It works if the number of input images is for example 32,... 192,...


Non integer multiple sizes behave as followed:

200 input images:

ValueError: operands could not be broadcast together with shapes (32,) (8,) (32,)

600 input images:

ValueError: operands could not be broadcast together with shapes (32,) (24,) (32,)

Please let me know if you manage to make it work for input counts that are non-integer multiples of the batch size!!!!!! :)

I got another problem:
"ValueError: Layer decoder expects 1 input(s), but it received 3 input tensors. Inputs received: [<tf.Tensor: shape=(32, 200), dtype=float32, numpy=...."

What's wrong? Thanks!

I got another problem:
"ValueError: Layer decoder expects 1 input(s), but it received 3 input tensors. Inputs received: [<tf.Tensor: shape=(32, 200), dtype=float32, numpy=...."

What's wrong? Thanks!

Revise the code:

class VAEModel(Model):
    ...
    def call(self,inputs):
        **_z_mean, z_log_var,_** latent = self.encoder(inputs)
        return self.decoder(latent)

As far as I found out, this is connected to the fact that the number of input images is not an integer multiple of the batch size.

It works if the number of input images is for example 32,... 192,...

Non integer multiple sizes behave as followed:

200 input images:

ValueError: operands could not be broadcast together with shapes (32,) (8,) (32,)

600 input images:

ValueError: operands could not be broadcast together with shapes (32,) (24,) (32,)

Please let me know if you manage to make it work for input counts that are non-integer multiples of the batch size!!!!!! :)

What's my case?

ValueError: operands could not be broadcast together with shapes (32,) (7,) (32,)

What's my case?

ValueError: operands could not be broadcast together with shapes (32,) (7,) (32,)

@daiyl consider decreasing your number of input images by 7, or increase it by 25.
So your number of input images becomes an integer multiple of your batch size 32.

What's my case?
ValueError: operands could not be broadcast together with shapes (32,) (7,) (32,)

@daiyl consider decreasing your number of input images by 7, or increase it by 25.
So your number of input images becomes an integer multiple of your batch size 32.

@Zindyrella Thanks for your reply. However, where can I control the number of input images? I cannot find the control code. The ValueError case seems to happen randomly.

@daiyl - Just in case if you are still having this issue, the below are the steps how I created a work around:

DATA_FOLDER = '/content/data_faces/img_align_celeba/' # assuming that you have your images in this folder
BATCH_SIZE = 32
filenames = np.array(glob(os.path.join(DATA_FOLDER, '*.jpg')))
NUM_IMAGES = len(filenames)

print(' -- NUM_IMAGES :', NUM_IMAGES)
print(' ---- NUM_IMAGES / BATCH_SIZE :', NUM_IMAGES/BATCH_SIZE)

Now, assuming that the sample output is as below:
--- NUM_IMAGES : 202599
--- NUM_IMAGES / BATCH_SIZE : 6331.21875

So we have 0.21875 * 32 = 7 excess images OR (1-0.21875) * 32 = 25 images less for forming whole-batches.

So please use the Jupyter magic commands ( %cd etc) to either create 25 dummy images as below. Alternatively you can delete 7 images. Since the data set is huge, adding 25 dummy images may not skew the results nor does removing 7 images

!cp 000001.jpg D1.jpg
!cp 000001.jpg D2.jpg
!cp 000001.jpg D3.jpg
!cp 000001.jpg D4.jpg
!cp 000001.jpg D5.jpg
!cp 000001.jpg D6.jpg
!cp 000001.jpg D7.jpg
!cp 000001.jpg D7.jpg
!cp 000001.jpg D8.jpg
!cp 000001.jpg D9.jpg
!cp 000001.jpg D10.jpg
!cp 000001.jpg D11.jpg
!cp 000001.jpg D12.jpg
!cp 000001.jpg D13.jpg
!cp 000001.jpg D14.jpg
!cp 000001.jpg D15.jpg
!cp 000001.jpg D16.jpg
!cp 000001.jpg D17.jpg
!cp 000001.jpg D18.jpg
!cp 000001.jpg D19.jpg
!cp 000001.jpg D20.jpg
!cp 000001.jpg D21.jpg
!cp 000001.jpg D22.jpg
!cp 000001.jpg D23.jpg
!cp 000001.jpg D24.jpg
!cp 000001.jpg D25.jpg

This may not be a good solution, but a quick fix to move ahead.

Thanks
Satya

I got another problem:
"ValueError: Layer decoder expects 1 input(s), but it received 3 input tensors. Inputs received: [<tf.Tensor: shape=(32, 200), dtype=float32, numpy=...."
What's wrong? Thanks!

Revise the code:

class VAEModel(Model):
    ...
    def call(self,inputs):
        **_z_mean, z_log_var,_** latent = self.encoder(inputs)
        return self.decoder(latent)

i have the same error, the update can not work!