AloneTogetherY / text-to-image-synthesis

Text to image synthesis with GAN-CLS and MSGAN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exploring GAN-CLS and MSGAN with a bird-dataset

I combine the GAN-CLS algorithm by Reed et al. [1] and the MS-GAN regulation term by Mao et al. [2] and experiment with the caltech bird-dataset. I experimented with the GAN architecture proposed by Ledig et al [3].

Usage

  1. Please refer to the READMEs in the folder images, captions, and word2vec_pretrained_model to obtain the necessary data.
  2. Run python process_images.py to resize and normalize the images and generate numpy arrays.
  3. Run python process_captions.py to generate sentence embeddings for the captions.
  4. Upload the generated images vectors, sentence vectors and pretrained word2vec model to a Google Drive account.
  5. Import the jupyter notebook Text2Image_GAN_MS.ipynb in Google Colab and load the data.
  6. Run code snippets in Google Colab.

Results

I trained the GAN model for 960 epochs with the ADAM optimizer [4] for the discriminator and generator with a learning rate of 0.000035 and beta_1=0.5. Most of the synthesized images do depict plausible colors and shapes of birds and there does seem to be a lot of diversity; however, the GAN did have some minor mode collapse problems when generating images based on made up captions as seen below.

Interpolating between sentence vectors

References

[1] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. Generative adversarial text-to-image synthesis. In Proceedings of The 33rd International Conference on Machine Learning, 2016.
[2] Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, and Ming-Hsuan Yang. Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis, IEEE Conference on Computer Vision and Pattern Recognition, 2019.
[3] Christian Ledig, Lucas Theis, Ference Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, et al. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802, 2016.
[4] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.

About

Text to image synthesis with GAN-CLS and MSGAN


Languages

Language:Jupyter Notebook 86.2%Language:Python 13.8%