VICTR: Visual Information Captured Text Representation for Text-to-Image Generation Tasks

This repository contains code for paper VICTR: Visual Information Captured Text Representation for Text-to-Image Generation Tasks

Han, C., Long, S., Luo, S., Wang, K., & Poon, J. (2020, December).
VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks
In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), pp. 3107-3117

1. Introduction

The proposed VICTR representation for text-to-image multimodal tasks contains two major types of embedding: (1) Basic Graph embedding (for object, relation, attribute) and (2) Positional Graph embedding (for object, relation), which captures rich visual semantic information of objects from the text description. This repository provides the the integration of proposed VICTR representation based on the three original text-to-image generation models: stackGAN, attnGAN and DM-GAN.

2. Main code structure and running requirement

Root ---> repository

code ---> the main code for the three models

stackgan_victr ---> main code for stackGAN+VICTR

attngan_victr ---> main code for attnGAN+VICTR

dmgan_victr ---> main code for DM-GAN+VICTR

DAMSMencoders ---> pretrained DAMSM text/image encoder from attnGAN

data

coco ---> COCO2014 images and related data files

train ---> train related data files

test ---> test related data files

output ---> model output

Environment for running the code:

python 3.6
pytorch 1.4.0 (pip install torch==1.4.0 torchvision==0.5.0)

3. Setup and data preperation

3.1 Origianl text-to-image related setup

->Preprocessed COCO metadata

COCO provided by attnGAN

download and unzip it to data/

->Pretrained DAMSM text/image encoder

DAMSM for COCO provided by attnGAN

download and unzip it to DAMSMencoders/
for training of the DAMSM model, please refer to attnGAN

3.2 Coco2014 images for training and evaluation

Training: wget http://images.cocodataset.org/zips/train2014.zip

Evaluation: wget http://images.cocodataset.org/zips/val2014.zip

After downloading, unzip all the images under data/coco/images/ folder

3.3 Preprocessed caption graphs and trained embeddings of VICTR

Processed caption graphs:

Training: python google_drive.py 1LVnM22QKO6hbCzQ173EjOvNBLCJ7JopP victr_sg_train.zip download and unzip to data/coco/train/
Evaluation: python google_drive.py 1KhJezwScr_yd7wfeyczSRjDuf7IYaNDp victr_sg_test.zip download and unzip to data/coco/test/

Trained graph embeddings: python google_drive.py 1lr7Mcw6R6cr5zYnjYJ_ckmnkR0ARYa3q victr_graph.zip download and unzip to data/coco/

4. Training

Go to the main code directory of the corresponding model and fun the training command:

attnGAN-VICTR: cd code/attngan_victr and python main.py --cfg cfg/coco_attn2.yml --gpu 0 --use_sg
DM-GAN-VICTR: cd code/dmgan_victr and python main.py --cfg cfg/coco_DMGAN.yml --gpu 0 --use_sg

The saved models will be available in output files. The training epoch and saving interval can be changed by specifying the value for MAX_EPOCH and TRAIN.SNAPSHOT_INTERVAL in the corresponding training yml files:

attnGAN-VICTR: code/attngan_victr/cfg/coco_attn2.yml
DM-GAN-VICTR: code/dmgan_victr/cfg/coco_DMGAN.yml

5. Evaluation

Replace the path to saved models to the TRAIN.NET_G and TRAIN.SG_ATTN in the evaluation yml files (e.g. NET_G: '../models/netG_epoch_128.pth' and SG.SG_ATTN: '../models/attnsg_epoch_128.pth'), and make sure the B_VALIDATION is set to True which will use the coco2014 eval set for generation:

attnGAN-VICTR: code/attngan_victr/cfg/eval_coco.yml
DM-GAN-VICTR: code/dmgan_victr/cfg/eval_coco.yml

Run the following command:

python main.py --cfg cfg/eval_coco.yml --gpu 0 --use_sg

Evaluation metrics By running the evaluation code, the generated images can be found in the folder under the model path. To evalute the generated images, the R-precision will be calculated automatically during the evaluation (Using the evaluation code from DM-GAN). For the IS and FID, we also use directly the evaluation script from DM-GAN.

References:

About

This repository contains code for paper VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

Languages

Language:Python 100.0%