shreshtashetty / CycleGANPhototoMonet

Converting photos into Monet style paintings using CycleGANs with Differentiable Augmentation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CycleGAN with Differentiable Augmentation for Style Transfer

As a reference, here is the main CycleGAN paper: JY Zhu et al "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks". Link: https://arxiv.org/pdf/1703.10593.pdf

and the Differential Augmentation Paper: S Zhao et al "Differentiable Augmentation for Data-Efficient GAN Training". Link: https://arxiv.org/pdf/2006.10738.pdf

The training dataset consists of Monet paintings and photos that need to be converted into Monet-style paintings. The two datasets are not paired.

CycleGAN:

Task: We need to translate images from a source domain X (that of original photos) to a target domain Y (that of Monet paintings).

Vanilla GAN

Very simply put, a vanilla GAN consists of a Generator and a Discriminator. The Generator tries to generate images of our required type. The Discriminator is fed in real images from our training set and images generated by the generator, and it tries to distinguish between the real and fake images. The Generator has to 'fool' the Discriminator by generating images so close to the real image distribution that the Discriminator labels these fake images as real ones.

Why Style Transfer needs a new architecture

In this task, however, we need to learn a mapping G from domain X->domain Y. Just this one constraint doesn't give us a unique mapping function-- there are infinitely many mappings G that map X to Y. Moreover, this also doesn't give robustness against mode collapse. In order to constrain our system furthur, we need to introduce cycle consistency, which exploits the fact that translation needs to be cycle consistent-- i.e., an image mapped from X->Y and back should yield the same image.

The CycleGAN architecture

Therefore we have 2 Generators Gx (G in the figure) mapping from X->Y and Gy (F in the figure) mapping from Y->X. We also need 2 adverserial Discriminators-- Dx which encourages Gx to translate an image from domain X into an image having a distribution extremely close to domain Y and Dy that does the same while translating the image from domain Y to domain X.

Generator: The Generator used in this repository is a UNet with skip connections. We have a Monet Generator (photo->Monet) and a Photo Generator (Monet->photo).

Discriminator: The Discriminator used in this repository is a PatchGAN type discriminator that outputs a 30x30 image. Each 30x30 patch classifies a 70x70 patch as real or fake. Higher pixel values indicate a real classification and lower values indicate a fake classification.

[1]

Loss Functions:

Generator loss function:The generator wants to fool the discriminator into thinking the generated image is real. The perfect generator will have the discriminator output only 1s. Thus, it compares the generated image to a matrix of 1s to find the loss.

Discriminator loss function: The Discriminator loss function compares real images to a matrix of 1s and fake images to a matrix of 0s. The perfect Discriminator will output all 1s for real images and all 0s for fake images. The Discriminator loss outputs the average of the real and Generator loss.

Cycle consistency loss function: We want our original photo and the twice transformed photo to be similar to one another. Thus, we can calculate the cycle consistency loss by finding the average of their difference.

Identity loss function: The identity loss compares the image with its Generator generated image (i.e. monet with monet generator and photo with photo generator). If given a photo as input, we want it to generate the same image because the image was originally a photo. The identity loss compares the input with the output of the Generator.

Evaluation Metric:

Frechet Inception Distance (FID): The FID metric is the squared Wasserstein metric between two multidimensional Gaussian distributions-- the distribution of some neural network features of the images generated by the GAN and the distribution of the same neural network features from the "world" or real images used to train the GAN. The Inception v3 network trained on the ImageNet is commonly used as the deep network of choice. As a result, FID can be computed from the mean and the covariance of the activations when the synthesized and real images are fed into the Inception network as follows:

Additional Details:

Differential Augmentation: The dataset being used for training had 300 Monet images and 7308 photos. Since there was very little Monet image data, it lead to the model overfitting. This called for the use of some kind of augmentation. Differential Augmentation applies an augmentation to the real images as well as the generated images before they are sent into the Discriminator. In order to update the Generator, the gradients flow back through the Discriminator and the 'augmentation function', hence this augmentation needs to be differentiable. Refer to the paper for a more in depth explanation.

[2]

Dataset used: Images for training and testing are obtained from Kaggle. Their learning competition on CycleGANs-- "I am Something of a Painter Myself" has the entire dataset that can be downloaded as a .zip file. It has tfrecords too, which one can directly work with.

Citations

[1]

@inproceedings{CycleGAN2017, 
  title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks}, 
  author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A}, 
  booktitle={Computer Vision (ICCV), 2017 IEEE International Conference on}, 
  year={2017} 
}

[2]

@inproceedings{zhao2020diffaugment, 
  title={Differentiable Augmentation for Data-Efficient GAN Training}, 
  author={Zhao, Shengyu and Liu, Zhijian and Lin, Ji and Zhu, Jun-Yan and Han, Song}, 
  booktitle={Conference on Neural Information Processing Systems (NeurIPS)}, 
  year={2020} <br />
}

If you find this repository useful, please cite the following:

 @misc{Shreshta2021CycleGANPhototoMonet,
   author = {Shetty, Shreshta}, 
   title = {CycleGANPhototoMonet}, 
   year = {2021}, 
   publisher = {GitHub}, 
   journal = {GitHub repository}, 
   howpublished = {\url{https://github.com/shreshtashetty/CycleGANPhototoMonet}},
 }

About

Converting photos into Monet style paintings using CycleGANs with Differentiable Augmentation.


Languages

Language:Jupyter Notebook 100.0%