Combine CartoonGAN with ESRGAN

Question

Combine CartoonGAN with ESRGAN

khanhlvg opened this issue 4 years ago · comments

As GAN model usually consume a lot of memory, we couldn't use them on large images. I wonder if we can convert a small image to cartoon (e.g. 256 * 256) then use ESRGAN to enlarge it (e.g. to 512 * 512).

@sayakpaul
Could you be able to try the approach on a notebook and see the quality of it compared to directly convert a 512 * 512 images to cartoon?

@margaretmz FYI

Sayak Paul · Answer 1 · Wed Aug 05 2020 17:38:50 GMT+0800 (China Standard Time)

@khanhlvg this is an interesting idea. Thank you.

FYI our ESRGAN model operates on 128x128 images (that is the dimension it was trained on) and it does not support dynamic shapes implicitly as far as I know. So, here's what we can do:

Export a CartoonGAN model with 128x128 shape.
Use it to cartoonize an image.
Use the ESRGAN model to enlarge the cartoonized image to 512x512.
Compare.

WDYT?

Sayak Paul · Answer 2 · Wed Aug 05 2020 19:11:33 GMT+0800 (China Standard Time)

@khanhlvg I tried this workflow and here's the Colab Notebook.

Results:

Original TFLite model supporting 512x512 shape directly:

Margaret Maynard-Reid · Answer 3 · Thu Aug 06 2020 00:18:41 GMT+0800 (China Standard Time)

Interesting idea, @khanhlvg! And thanks for the Python results, @sayakpaul.

Looking at the screenshots, I'm not sure about the ESRGAN result - in fact the fuzzy 128x128 one looks better than the ESRGAN one.

Also I'm not sure about chaining 2 of the models together to product an image for displaying in the Android UI. At the moment both model inference takes time and combining them will be even slower, which results in poorer user experience.

Perhaps going from 128 to 512 is too drastic of a change. "If" we could find the right downsample/upsample numbers for a good ESRGAN result, chaining the models together to produce a higher res image could be interesting for this use case: when the user would like to download a high res image from the app.

Sayak Paul · Answer 4 · Thu Aug 06 2020 00:41:41 GMT+0800 (China Standard Time)

If" we could find the right downsample/upsample numbers for a good ESRGAN result

@margaretmz for the current ESRGAN model the numbers already right actually. The non-distilled version produces 512x512 dimensional images (4x upsampling) from 128x128 dimensional images.

I think the original 512x512 DR and int8 CartoonGAN models do provide good enough outputs.

Margaret Maynard-Reid · Answer 5 · Thu Aug 06 2020 01:06:09 GMT+0800 (China Standard Time)

I think the original 512x512 DR and int8 CartoonGAN models do provide good enough outputs.

Agreed. We don't need the workaround using ESRGAN.

Khanh · Answer 6 · Thu Aug 06 2020 08:59:06 GMT+0800 (China Standard Time)

Thanks Sayak for quickly trying out the idea!

I originally thought that ESRGAN would be faster than CartoonGAN but it turned out that it's not. Also the combined quality is not very good.

Therefore, it makes sense to stick with just CartoonGAN.

Sayak Paul · Answer 7 · Thu Aug 06 2020 10:29:45 GMT+0800 (China Standard Time)

I'm closing this issue for now. Please feel free to reopen as needed.