Higher Resolution
jamahun opened this issue · comments
Is there a way to upsize the outputs to something closer to 1024px? I've noticed a few people on twitter that have been able to do so with this model but after trying to change the image size to a higher value I get this error for anything over 256 -
/usr/local/lib/python3.7/dist-packages/glide_text2im/model_creation.py in create_model(image_size, num_channels, num_res_blocks, channel_mult, attention_resolutions, num_heads, num_head_channels, num_heads_upsample, use_scale_shift_norm, dropout, text_ctx, xf_width, xf_layers, xf_heads, xf_final_ln, xf_padding, resblock_updown, use_fp16, cache_text_emb, inpaint, super_res)
140 channel_mult = (1, 2, 3, 4)
141 else:
--> 142 raise ValueError(f"unsupported image size: {image_size}")
143 else:
144 channel_mult = tuple(int(ch_mult) for ch_mult in channel_mult.split(","))
ValueError: unsupported image size: 1024
The error happens at line 142 because channel_mult
is ""
and image_size
is not one of 256
, 128
, 64
.
glide-text2im/glide_text2im/model_creation.py
Lines 134 to 145 in 9cc8e56
Maybe try to set channel_mult
to a non-empty string.
Thanks so much for your response @woctezuma I did see this part of the script but didn’t really know how I could change it. I’ve got a very basic understanding of python and prgramming languages in general any chance you could give me an example of how I can change the channel_mult
to a non-empty string?
I don't know:
- what would be the correct value for
channel_mult
for 1024x1024 resolution, - or whether it would make sense to supply a value for 1024x1024 resolution.
However, I can make a few remarks about values which would work with the else
statement and pass the assert
check.
First, the string should be a sequence of integers separated by commas.
For instance: channel_mult = "1,2,3,4"
would be correctly parsed and transformed into (1, 2, 3, 4)
Second, the length of the string should be equal to log2(image_size) - 2
.
For instance, for an image resolution of 64, the length of the tuple is log2(64) - 2 = 6 - 2 = 4
.
This is consistent with the tuple mentioned above, i.e. (1, 2, 3, 4)
.
For an image resolution of 1024, the length of the tuple should be log2(1024) - 2 = 10 - 2 = 8
.
Personally, based on the examples mentioned above, I would try one of the following tuples:
(1, 1, 2, 2, 3, 3, 4, 4)
(1, 1, 1, 2, 2, 3, 4, 4)
(1, 1, 1, 1, 2, 2, 4, 4)
That would require an input string:
"1,1,2,2,3,3,4,4"
"1,1,1,2,2,3,4,4"
"1,1,1,1,2,2,4,4"
We haven't trained an upsampler for higher resolutions. You can't just change the code and load the old upsampler model--it won't work because it was trained for 256x256.
People on Twitter have been using third-party upsamplers / image super resolution models.