openai / glide-text2im

GLIDE: a diffusion-based text-conditional image synthesis model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Higher Resolution

jamahun opened this issue · comments

Is there a way to upsize the outputs to something closer to 1024px? I've noticed a few people on twitter that have been able to do so with this model but after trying to change the image size to a higher value I get this error for anything over 256 -

/usr/local/lib/python3.7/dist-packages/glide_text2im/model_creation.py in create_model(image_size, num_channels, num_res_blocks, channel_mult, attention_resolutions, num_heads, num_head_channels, num_heads_upsample, use_scale_shift_norm, dropout, text_ctx, xf_width, xf_layers, xf_heads, xf_final_ln, xf_padding, resblock_updown, use_fp16, cache_text_emb, inpaint, super_res)
140 channel_mult = (1, 2, 3, 4)
141 else:
--> 142 raise ValueError(f"unsupported image size: {image_size}")
143 else:
144 channel_mult = tuple(int(ch_mult) for ch_mult in channel_mult.split(","))
ValueError: unsupported image size: 1024

commented

The error happens at line 142 because channel_mult is "" and image_size is not one of 256, 128, 64.

if channel_mult == "":
if image_size == 256:
channel_mult = (1, 1, 2, 2, 4, 4)
elif image_size == 128:
channel_mult = (1, 1, 2, 3, 4)
elif image_size == 64:
channel_mult = (1, 2, 3, 4)
else:
raise ValueError(f"unsupported image size: {image_size}")
else:
channel_mult = tuple(int(ch_mult) for ch_mult in channel_mult.split(","))
assert 2 ** (len(channel_mult) + 2) == image_size

Maybe try to set channel_mult to a non-empty string.

Thanks so much for your response @woctezuma I did see this part of the script but didn’t really know how I could change it. I’ve got a very basic understanding of python and prgramming languages in general any chance you could give me an example of how I can change the channel_mult to a non-empty string?

commented

I don't know:

  • what would be the correct value for channel_mult for 1024x1024 resolution,
  • or whether it would make sense to supply a value for 1024x1024 resolution.

However, I can make a few remarks about values which would work with the else statement and pass the assert check.

First, the string should be a sequence of integers separated by commas.
For instance: channel_mult = "1,2,3,4" would be correctly parsed and transformed into (1, 2, 3, 4)

Second, the length of the string should be equal to log2(image_size) - 2.
For instance, for an image resolution of 64, the length of the tuple is log2(64) - 2 = 6 - 2 = 4.
This is consistent with the tuple mentioned above, i.e. (1, 2, 3, 4).

For an image resolution of 1024, the length of the tuple should be log2(1024) - 2 = 10 - 2 = 8.

Personally, based on the examples mentioned above, I would try one of the following tuples:

  • (1, 1, 2, 2, 3, 3, 4, 4)
  • (1, 1, 1, 2, 2, 3, 4, 4)
  • (1, 1, 1, 1, 2, 2, 4, 4)

That would require an input string:

  • "1,1,2,2,3,3,4,4"
  • "1,1,1,2,2,3,4,4"
  • "1,1,1,1,2,2,4,4"

We haven't trained an upsampler for higher resolutions. You can't just change the code and load the old upsampler model--it won't work because it was trained for 256x256.

People on Twitter have been using third-party upsamplers / image super resolution models.