margaretmz / Cartoonizer-with-TFLite

How to create a Cartoonizer Android app with TensorFlow Lite models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Combine CartoonGAN with ESRGAN

khanhlvg opened this issue · comments

commented

As GAN model usually consume a lot of memory, we couldn't use them on large images. I wonder if we can convert a small image to cartoon (e.g. 256 * 256) then use ESRGAN to enlarge it (e.g. to 512 * 512).

@sayakpaul
Could you be able to try the approach on a notebook and see the quality of it compared to directly convert a 512 * 512 images to cartoon?

@margaretmz FYI

@khanhlvg this is an interesting idea. Thank you.

FYI our ESRGAN model operates on 128x128 images (that is the dimension it was trained on) and it does not support dynamic shapes implicitly as far as I know. So, here's what we can do:

  • Export a CartoonGAN model with 128x128 shape.
  • Use it to cartoonize an image.
  • Use the ESRGAN model to enlarge the cartoonized image to 512x512.
  • Compare.

WDYT?

@khanhlvg I tried this workflow and here's the Colab Notebook.

Results:

image

Original TFLite model supporting 512x512 shape directly:

carttoon

Interesting idea, @khanhlvg! And thanks for the Python results, @sayakpaul.

Looking at the screenshots, I'm not sure about the ESRGAN result - in fact the fuzzy 128x128 one looks better than the ESRGAN one.

Also I'm not sure about chaining 2 of the models together to product an image for displaying in the Android UI. At the moment both model inference takes time and combining them will be even slower, which results in poorer user experience.

Perhaps going from 128 to 512 is too drastic of a change. "If" we could find the right downsample/upsample numbers for a good ESRGAN result, chaining the models together to produce a higher res image could be interesting for this use case: when the user would like to download a high res image from the app.

If" we could find the right downsample/upsample numbers for a good ESRGAN result

@margaretmz for the current ESRGAN model the numbers already right actually. The non-distilled version produces 512x512 dimensional images (4x upsampling) from 128x128 dimensional images.

I think the original 512x512 DR and int8 CartoonGAN models do provide good enough outputs.

I think the original 512x512 DR and int8 CartoonGAN models do provide good enough outputs.

Agreed. We don't need the workaround using ESRGAN.

commented

Thanks Sayak for quickly trying out the idea!

I originally thought that ESRGAN would be faster than CartoonGAN but it turned out that it's not. Also the combined quality is not very good.

Therefore, it makes sense to stick with just CartoonGAN.

I'm closing this issue for now. Please feel free to reopen as needed.