03_05_vae_faces_train Error durinig training

Question

03_05_vae_faces_train Error durinig training

takeofuture opened this issue 4 years ago · comments

I try to run this on Google Colab (TF 2.3) (Branch=tensorflow_2)
I downloaded the image from https://www.kaggle.com/jessicali9530/celeba-dataset
and only placed about 1000 jpg files and changed epoch to 10 from 200.
The image is 178 x 218 Pixcls

I got the following error,

Epoch 1/10
16/31 [==============>...............] - ETA: 1:49 - loss: 938.0780 - reconstruction_loss: 937.9746 - kl_loss: 0.1033

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-12-6b00e1cc66a8> in <module>()
      5     , run_folder = RUN_FOLDER
      6     , print_every_n_batches = PRINT_EVERY_N_BATCHES
----> 7     , initial_epoch = INITIAL_EPOCH
      8 )

9 frames

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/generic_utils.py in update(self, current, values, finalize)
    556           self._values[k] = [v * value_base, value_base]
    557         else:
--> 558           self._values[k][0] += v * value_base
    559           self._values[k][1] += value_base
    560       else:

ValueError: operands could not be broadcast together with shapes (32,) (10,) (32,)

I am not sure if this is bug of util code, does anybody try this sucessfuly ?
(I am wondering if the number of image is too little or something else)

Zindyrella · Answer 1 · Wed Dec 16 2020 05:15:39 GMT+0800 (China Standard Time)

As far as I found out, this is connected to the fact that the number of input images is not an integer multiple of the batch size.

It works if the number of input images is for example 32,... 192,...

Non integer multiple sizes behave as followed:

200 input images:

ValueError: operands could not be broadcast together with shapes (32,) (8,) (32,)

600 input images:

ValueError: operands could not be broadcast together with shapes (32,) (24,) (32,)

Please let me know if you manage to make it work for input counts that are non-integer multiples of the batch size!!!!!! :)

Yinglong Dai · Answer 2 · Sat Dec 19 2020 21:34:46 GMT+0800 (China Standard Time)

I got another problem:
"ValueError: Layer decoder expects 1 input(s), but it received 3 input tensors. Inputs received: [<tf.Tensor: shape=(32, 200), dtype=float32, numpy=...."

What's wrong? Thanks!

Yinglong Dai · Answer 3 · Sat Dec 19 2020 21:53:49 GMT+0800 (China Standard Time)

I got another problem:
"ValueError: Layer decoder expects 1 input(s), but it received 3 input tensors. Inputs received: [<tf.Tensor: shape=(32, 200), dtype=float32, numpy=...."

What's wrong? Thanks!

Revise the code:

class VAEModel(Model):
    ...
    def call(self,inputs):
        **_z_mean, z_log_var,_** latent = self.encoder(inputs)
        return self.decoder(latent)

Yinglong Dai · Answer 4 · Sat Dec 19 2020 23:08:46 GMT+0800 (China Standard Time)

As far as I found out, this is connected to the fact that the number of input images is not an integer multiple of the batch size.

It works if the number of input images is for example 32,... 192,...

Non integer multiple sizes behave as followed:

200 input images:
ValueError: operands could not be broadcast together with shapes (32,) (8,) (32,)
600 input images:
ValueError: operands could not be broadcast together with shapes (32,) (24,) (32,)
Please let me know if you manage to make it work for input counts that are non-integer multiples of the batch size!!!!!! :)

What's my case?

ValueError: operands could not be broadcast together with shapes (32,) (7,) (32,)

Zindyrella · Answer 5 · Sun Dec 20 2020 04:05:00 GMT+0800 (China Standard Time)

What's my case?

ValueError: operands could not be broadcast together with shapes (32,) (7,) (32,)

@daiyl consider decreasing your number of input images by 7, or increase it by 25.
So your number of input images becomes an integer multiple of your batch size 32.

Yinglong Dai · Answer 6 · Sun Dec 20 2020 04:32:15 GMT+0800 (China Standard Time)

What's my case?
ValueError: operands could not be broadcast together with shapes (32,) (7,) (32,)

@daiyl consider decreasing your number of input images by 7, or increase it by 25.
So your number of input images becomes an integer multiple of your batch size 32.

@Zindyrella Thanks for your reply. However, where can I control the number of input images? I cannot find the control code. The ValueError case seems to happen randomly.

Satya Allamraju · Answer 7 · Sat Dec 26 2020 15:14:10 GMT+0800 (China Standard Time)

@daiyl - Just in case if you are still having this issue, the below are the steps how I created a work around:

DATA_FOLDER = '/content/data_faces/img_align_celeba/' # assuming that you have your images in this folder
BATCH_SIZE = 32
filenames = np.array(glob(os.path.join(DATA_FOLDER, '*.jpg')))
NUM_IMAGES = len(filenames)

print(' -- NUM_IMAGES :', NUM_IMAGES)
print(' ---- NUM_IMAGES / BATCH_SIZE :', NUM_IMAGES/BATCH_SIZE)

Now, assuming that the sample output is as below:
--- NUM_IMAGES : 202599
--- NUM_IMAGES / BATCH_SIZE : 6331.21875

So we have 0.21875 * 32 = 7 excess images OR (1-0.21875) * 32 = 25 images less for forming whole-batches.

So please use the Jupyter magic commands ( %cd etc) to either create 25 dummy images as below. Alternatively you can delete 7 images. Since the data set is huge, adding 25 dummy images may not skew the results nor does removing 7 images

!cp 000001.jpg D1.jpg
!cp 000001.jpg D2.jpg
!cp 000001.jpg D3.jpg
!cp 000001.jpg D4.jpg
!cp 000001.jpg D5.jpg
!cp 000001.jpg D6.jpg
!cp 000001.jpg D7.jpg
!cp 000001.jpg D7.jpg
!cp 000001.jpg D8.jpg
!cp 000001.jpg D9.jpg
!cp 000001.jpg D10.jpg
!cp 000001.jpg D11.jpg
!cp 000001.jpg D12.jpg
!cp 000001.jpg D13.jpg
!cp 000001.jpg D14.jpg
!cp 000001.jpg D15.jpg
!cp 000001.jpg D16.jpg
!cp 000001.jpg D17.jpg
!cp 000001.jpg D18.jpg
!cp 000001.jpg D19.jpg
!cp 000001.jpg D20.jpg
!cp 000001.jpg D21.jpg
!cp 000001.jpg D22.jpg
!cp 000001.jpg D23.jpg
!cp 000001.jpg D24.jpg
!cp 000001.jpg D25.jpg

This may not be a good solution, but a quick fix to move ahead.

Thanks
Satya

zm274310577 · Answer 8 · Wed Mar 30 2022 17:16:55 GMT+0800 (China Standard Time)

I got another problem:
"ValueError: Layer decoder expects 1 input(s), but it received 3 input tensors. Inputs received: [<tf.Tensor: shape=(32, 200), dtype=float32, numpy=...."
What's wrong? Thanks!

Revise the code:
class VAEModel(Model):
    ...
    def call(self,inputs):
        **_z_mean, z_log_var,_** latent = self.encoder(inputs)
        return self.decoder(latent)

i have the same error, the update can not work!