MAR for Image-to-Image Generation

Question

MAR for Image-to-Image Generation

Bili-Sakura opened this issue 3 months ago · comments

I am a green-hand of MAR, as I only explore LDMs in image generation currently. I wonder whether MAR can be used for Image-to-Image generation especially for image editing (e.g. temporal generation). Also, I wonder whether MAR (with diff loss) trained with large-scale dataset is able to be fine-tuned with a domain specific dataset for temporal generation.

Again, awesome work!

Tianhong Li · Answer 1 · Wed Aug 14 2024 19:02:06 GMT+0800 (China Standard Time)

Thanks for your interest! Similar to MaskGIT and MAGE, MAR can naturally perform mask-based image editing tasks such as image inpainting, outpainting, and uncropping. If the temporal generation you mentioned is for video, then it would be hard to fine-tune a model that is trained to generate image to generate a video.

Sakura · Answer 2 · Wed Aug 14 2024 19:33:30 GMT+0800 (China Standard Time)

Thank you!
I am working with satellite image generation where each image captures the same region from different timestamp ( and this is what I meant to "temporal" and I am not sure whether this task can be categorized into inpainting or not).
It is witnessed that temporal layers used in LDMs for video synthesis can be incorporated into ControlNet so that LDMs are able to generate temporal images. I wonder if there are similar techniques for MAR.

Tianhong Li · Answer 3 · Wed Aug 14 2024 22:19:19 GMT+0800 (China Standard Time)

The MAR framework in this repo is designed to generate images and cannot be directly used to generate videos. However, it should be easy to adapt the code a bit for the task you described, similar to LDM for video synthesis.