dome272 / Paella

Official Implementation of Paella https://arxiv.org/abs/2211.07292v2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

questions on parameters

metaphorz opened this issue · comments

Some questions and observations after trying this out. Nice notebook! This is not an issue
but no Discussion tab is present in your github repo.

  1. Is there a way to specify additional hyperparameters such as seed, image size, iterations...?
  2. When you set the batch size to 1, the resulting image displayed is huge (not accurate to the actual resolution)
  3. It would be great if the output (under /content/output in colab) could be one image. I understand that i can set batch size to 1, but I am looking to put in a number, and have that many images saved under /content/output. Having images glued together horizontally is less appealing (for me).

Hi Paul! I'm glad you're enjoying our work! Let me try to solve these for you!

Is there a way to specify additional hyperparameters such as seed, image size, iterations...?

In order to use a seed, you can use PyTorch random seed, you could do something like:

with torch.random.fork_rng():
    torch.manual_seed(42)
    sampled = sample(...)

As for image size, iterations, etc.. Yeah, the sample method has parameters like:
T & renoise_steps (by default 12 and 11) to define the sampling iterations, we see that for better results, renoise_steps must be equal to T-1
size a tuple (by default (32, 32)) defining the size of the latent tokens that Paella will sample, then the vqGAN will decode them to an image with a resolution x8, so if you want to try a landscape image, you can use size=(32, 64)
There are other parameters, like a mask (for impainting), an initial image (for image2image), etc, take a look here: https://github.com/dome272/Paella/blob/main/utils.py#L29

When you set the batch size to 1, the resulting image displayed is huge (not accurate to the actual resolution)

This is because we use matplotlib to display the images in the notebook. You can set a height & width in the showimages method, but you probably won't get the actual image size. Best way to get the actual size is to save the images, for example with torchvision.utils.save_image or our provided method saveimages which brings me to the third question...

It would be great if the output (under /content/output in colab) could be one image. I understand that i can set batch size to 1, but I am looking to put in a number, and have that many images saved under /content/output. Having images glued together horizontally is less appealing (for me).

The function saveimags expects a batch of images, and will automatically make a grid, but you can just call it with each image separately instead:

for i, img in enumerate(sampled):
    saveimages(img, mode + "_" + text + f"_{i}", nrow=len(sampled))

This should create a file for each image in the batch.

Pablo: This is a really nice detailed response. Thanks! I set the seed as specified and this is working. Not sure what T and renoise_steps can be set to in terms of nominal values. For size, on an A100, I was able to do (64,64) maximum, which yields a 512x512 image. Your loop suggestion worked great, as I now get individual output files.

Hey there, technically you should be able to sample higher resolution images. I can sample 128x128 latents on an A100. Maybe you can try removing the CLIP visual part if you only to text-conditional sampling like here: https://github.com/dome272/Paella/blob/main/paella_minimal.py#L34
The clip.visual is a >1B parameters model which takes up a lot of memory. So maybe that helps?