SnowdenLee / ALDM

Official implementation of "Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive" (ICLR 2024)

Home Page:https://yumengli007.github.io/ALDM/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive (ALDM)

πŸ”₯ Official implementation of "Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive" (ICLR 2024)

arXiv Static Badge Static Badge

overview result


Getting Started

Our environment is built on top of ControlNet:

conda env create -f environment.yaml  
conda activate aldm
pip install mit-semseg # for segmentation network UperNet

Pretrained Models

Pretrained models can be downloaded from here and saved in ./checkpoint

Dataset Preparation

Datasets should be structured as follows to enable ALDM training. Dataset path should be adjusted accordingly in dataloader/cityscapes.py and dataloader/ade20k.py.

Click to expand
datasets
β”œβ”€β”€ cityscapes
β”‚   β”œβ”€β”€ gtFine
β”‚       β”œβ”€β”€ train 
β”‚       └── val 
β”‚   └── leftImg8bit
β”‚       β”œβ”€β”€ train 
β”‚       └── val 
β”œβ”€β”€ ADE20K
β”‚   β”œβ”€β”€ annotations
β”‚       β”œβ”€β”€ train 
β”‚       └── val 
β”‚   └── images
β”‚       β”œβ”€β”€ train 
β”‚       └── val 
└── ...

Inference

We provide three ways for testing: (1) JupyterNotebook, (2) Gradio Demo, (3) Bash scripts.

  1. JupyterNotebook: we provided one sample layout for quick test without requiring dataset setup.

  2. Gradio Demo:

    Run the command after the dataset preparation.

    gradio gradio_demo/gradio_seg2image_cityscapes.py
    

    demo


  1. Bash scripts: we provide some bash scripts to enable large scale generation for the whole dataset. The synthesized data can be further used for training downstream models, e.g., semantic segmentation networks.

Citation

If you find our work useful, please star 🌟 this repo and cite:

@inproceedings{li2024aldm,
  title={Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive},
  author={Li, Yumeng and Keuper, Margret and Zhang, Dan and Khoreva, Anna},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024}
}

License

This project is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.

For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.

Purpose of the project

This software is a research prototype, solely developed for and published as part of the publication cited above. It will neither be maintained nor monitored in any way.

Contact

Please feel free to open an issue or contact personally if you have questions, need help, or need explanations. Don't hesitate to write an email to the following email address: liyumeng07@outlook.com

About

Official implementation of "Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive" (ICLR 2024)

https://yumengli007.github.io/ALDM/

License:GNU Affero General Public License v3.0


Languages

Language:Jupyter Notebook 68.6%Language:Python 31.0%Language:Shell 0.4%