Source code for project report "On The Robustness of Diffusion-Based Text-to-Image Generation" in CV-2022-Fall.

Members: Liang Chen, Zhe Yang, Zheng Li

Model Training

cd ModelTraining

Data Preparing

Before doing this experiment, please download images of MSCOCO 2017 dataset from https://cocodataset.org/#download

Environment Setting

We use the same environment as stable-diffusion https://github.com/CompVis/stable-diffusion A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

Training

Specify which GPU (or GPUs) you want to use to train the model with:

accelerate config

Set proper hyper-parameters in tune.sh. Do remember to set train_data_dir to the directory of your training set ! Training the model with:

bash tune.sh

If you want to use text augment methods like back translation,text crop and swap, you can add this augment to tune.sh:

--text_augment="bt" or --text_augment="crop_swap"

If you want to use text interpolation augment method to train the model, you can first run python encode_text.py to generate interpolation text vectors, and then add --text_embed_dir="./text_embed_linear_p_beta1_n5.bin"to tune.sh

inference

After training, we can generate images with trained model under the control of the texts in test set. You can generate images with:

bash generate.sh

Before generating images, set proper hyper-parameters in generate.sh: --model_name is the name of directory of your trained model; --output_dir is the directory of generated images.

Interpolation

cd Interpolation

Text Data Augmentation Method Implementation

Hidden States Interpolation

Note that this method needs the hidden states of text after clip encoder.

python HiddenStatesInterpolation.py

Other Augmentation Method(Random Deleting and Back-Translation)

we provide a jupyter notebook, please run Interpolation.ipynb

RobustnessAnalysis

cd RobustnessAnalysis

The first similarity "Image Similarity among random seeds " is in "Similarity_with_seed.ipynb", and it's similarity between the images generated from the same texts but with different seeds.
The second similarity "Similarity2: within similar texts" is in "similarity.ipynb", and its Chinese name is"组内相似度".
The third similarity "Faithfulness : between image and text" is in "text_img_similarity.ipynb", and it's similarity between the images and the texts.

chenllliang / Robust-Diffusion