eladrich / pixel2style2pixel

Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework

Home Page:https://eladrich.github.io/pixel2style2pixel/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RuntimeError: Given groups=1, weight of size [64, 1, 3, 3], expected input[8, 3, 256, 256] to have 1 channels, but got 3 channels instead

catherineyeh opened this issue · comments

Hi! I'm running into a similar trouble as this issue: #156 .
I am trying to train a psp encoder for the super resolution task with my own dataset (256x256, in greyscale)
Here are my parameters:

{'batch_size': 8,
 'board_interval': 50,
 'checkpoint_path': None,
 'dataset_type': 'mydataset_type',
 'encoder_type': 'GradualStyleEncoder',
 'exp_dir': 'exp',
 'id_lambda': 0.0,
 'image_interval': 100,
 'input_nc': 1,
 'l2_lambda': 1.0,
 'l2_lambda_crop': 0,
 'label_nc': 1,
 'learn_in_w': False,
 'learning_rate': 0.0001,
 'lpips_lambda': 0.8,
 'lpips_lambda_crop': 0,
 'max_steps': 500000,
 'moco_lambda': 0,
 'optim_name': 'ranger',
 'output_size': 1024,
 'resize_factors': '1,2,4,8',
 'save_interval': 5000,
 'start_from_latent_avg': True,
 'stylegan_weights': 'pretrained_models/exported25000pkl.pt',
 'test_batch_size': 8,
 'test_workers': 8,
 'train_decoder': False,
 'val_interval': 2500,
 'w_norm_lambda': 0.005,
 'workers': 8}

After it tries to load my custom dataset, I got the Traceback:

Traceback (most recent call last):
  File "scripts/train.py", line 32, in <module>
    main()
  File "scripts/train.py", line 28, in main
    coach.train()
  File "./training/coach.py", line 83, in train
    y_hat, latent = self.net.forward(x, return_latents=True)
  File "./models/psp.py", line 92, in forward
    codes = self.encoder(x)
  File "/home/cyeh/psp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "./models/encoders/psp_encoders.py", line 91, in forward
    x = self.input_layer(x)
  File "/home/cyeh/psp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cyeh/psp/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/cyeh/psp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cyeh/psp/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/home/cyeh/psp/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 1, 3, 3], expected input[8, 3, 256, 256] to have 1 channels, but got 3 channels instead

I specified input_nc to be 1 since I am using greyscale images as inputs, but based on the error message it seems it is receiving input of size [8, 3, 256, 256]. Not sure what is the root cause of this.

It seems like your parameters are correct for grayscale images.
Could you try running in debug and putting a breakpoint here:

y_hat, latent = self.net.forward(x, return_latents=True)

Let's see what the dimension of x is to see if there is something with the dataset that is not working as expected.

Thanks for the reply!

This was the dimension of x
torch.Size([8, 3, 256, 256])

It seems like the dataset has 3 channels... but upon checking the image information it looks like it is in greyscale...
I'll try with the rgb settings and see if any issue arises. Thanks!

Are the parameters for rgb 'input_nc': 3?

I wouldn't give up just yet on training with grayscale images.
Try putting a breakpoint in the __getitem__ function of the dataset and how PIL is reading the images.

def __getitem__(self, index):
from_path = self.source_paths[index]
from_im = Image.open(from_path)
from_im = from_im.convert('RGB') if self.opts.label_nc == 0 else from_im.convert('L')
to_path = self.target_paths[index]
to_im = Image.open(to_path).convert('RGB')
if self.target_transform:
to_im = self.target_transform(to_im)
if self.source_transform:
from_im = self.source_transform(from_im)
else:
from_im = to_im
return from_im, to_im

I think i see the problem actually in the following line:

to_im = Image.open(to_path).convert('RGB')

It is converting the to_im to RGB. You could try doing something like:

to_im = Image.open(to_path)
to_im = to_im.convert('RGB') if self.opts.label_nc == 0 else to_im.convert('L') 

Thanks! your suggestion + modifying the transformations solved the problem! Now I'm getting an error:

Traceback (most recent call last):
  File "scripts/train.py", line 32, in <module>
    main()
  File "scripts/train.py", line 28, in main
    coach.train()
  File "./training/coach.py", line 85, in train
    y_hat, latent = self.net.forward(x, return_latents=True)
  File "./models/psp.py", line 98, in forward
    codes = codes + self.latent_avg.repeat(codes.shape[0], 1, 1)
RuntimeError: The size of tensor a (14) must match the size of tensor b (18) at non-singleton dimension 1

I tried removing the "--start_from_latent_avg" flag, but then got:

./training/ranger.py:123: UserWarning: This overload of addcmul_ is deprecated:
        addcmul_(Number value, Tensor tensor1, Tensor tensor2)
Consider using one of the following signatures instead:
        addcmul_(Tensor tensor1, Tensor tensor2, *, Number value) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
  exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
Traceback (most recent call last):
  File "/home/cyeh/psp/lib/python3.6/site-packages/PIL/Image.py", line 2680, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 1), '|u1')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scripts/train.py", line 32, in <module>
    main()
  File "scripts/train.py", line 28, in main
    coach.train()
  File "./training/coach.py", line 94, in train
    self.parse_and_log_images(id_logs, x, y, y_hat, title='images/train/faces')
  File "./training/coach.py", line 241, in parse_and_log_images
    'target_face': common.tensor2im(y[i]),
  File "./utils/common.py", line 23, in tensor2im
    return Image.fromarray(var.astype('uint8'))
  File "/home/cyeh/psp/lib/python3.6/site-packages/PIL/Image.py", line 2682, in fromarray
    raise TypeError("Cannot handle this data type: %s, %s" % typekey)
TypeError: Cannot handle this data type: (1, 1, 1), |u1   

Regarding the first error,

RuntimeError: The size of tensor a (14) must match the size of tensor b (18) at non-singleton dimension 1

This seems to be caused because you are using a generator that has 14 latent codes, but you specified an output size of 1024. Did you maybe mean to set --output_size=256?
Regarding the other errors related to PIL, these are pretty common issues. My best advice is to run your code in debug, see where it falls, and try to Google the errors to find the correct fix. The issues are most likely caused due to your attempt to adopt the code to work with grayscale images. Nevertheless, the solution should be quite simple to find.

Adjusting the output_size resolved the issue, thanks a lot!
I'll close this issue for now.