dgjun32 / T2I_CLIP-GAN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Finetuning Pretrained CLIP using DAMSM and Constrastive Loss for text to image synthesis

1. Methodology

Neural network for Text to Image generation is composed of 2 sub-networks.

Text Encoder and Generator Network

Therefore, It requires two-step training to train text-to-image generator.

  1. Image Encoder and Text Encoder are jointly pretrained from image-caption pair thereby projecting image and text to common space.
  2. After text encoder pretraining, Generator Network is advarsarialy trained to generate realistic image based on text feature.

Recent research proposed using DAMSM loss + Contrastive loss for pretraining text encoder and training DM-GAN, thereby reaching SOTA.

In this work, We replaced RNN based text encoder and CNN based image encoder with CLIP, which is pretrained multimodal Vision Language Model based on transformer architecture.

2. CLIP

CLIP is multimodal encoder for image and natural language, which is pretrained using contrastive loss with huge batch size(=32768).

This is link for paper and official pytorch implementation of CLIP

3. Prepared Data

Download the preprocessed datasets from AttnGAN

Alternatively, another site is from DM-GAN

4. Trained model

5. Training

  1. Fine tuning pretrained CLIP encoder
  • With CUBS2011 using DAMSM + Contrastive loss : $ python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0

  • With COCO2014 using DAMSM + Contrastive loss : $ python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 0

  1. Training DM-GAN
  • With CUBS2011 : $ python main.py --cfg cfg/clip_bird_DMGAN.yml --gpu 0

  • With COCO2014 : $ python main.py --cfg cfg/clip_coco_DMGAN.yml --gpu 0

6. Evaluation

  1. Generate fake images and compute R precision
  • CUBS2011 : $ python main.py --cfg cfg/eval_clip_bird.yml

  • COCO2014 : $ python main.py --cfg cfg/eval_clip_coco.yml

  1. Compute FID(Frechet Inception Distance)
  • CUBS2011 : $ python fid_score.py --data bird --dims 2048 --batch_size 32

  • COCO2014 : $ python fid_score.py --data coco --dims 2048 --batch_size 32

  1. Compute Inception score
  • CUBS2011 : $ python inception_score.py --data bird

  • COCO2014 : $ python inception_score.py --data coco

7. Citation

About


Languages

Language:Python 100.0%