Monalissaa / DisenDiff

[CVPR`2024, Oral] Attention Calibration for Disentangled Text-to-Image Personalization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DisenDiff

This repository is the official implementation of DisenDiff [CVPR-2024 Oral Presentation].

Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang, Mengping Yang, Qin Zhou, Zhe Wang
pdf

Datasets

The training images are located in datasets/images, the test prompts are located in datasets/prompts, and the processed images for evaluating image-alignment can be found in datasets/data_eval.

Key modules

The crucial constraints for optimization are implemented in the function p_losses within src/model.py.

Results

Getting Started

conda env create -f environment.yml
conda activate ldm
git clone https://github.com/CompVis/stable-diffusion.git

Fine-tuning

## run training
bash run.sh

## sample and evaluate
bash eval.sh

The run.sh and eval.sh scripts include several hyperparameters such as classes in the input image,data_path, save_path, training caption, random seed, and more. Please modify these executable files to suit your specific requirements.

Contact Us

Yanbing Zhang: zhangyanbing@mail.ecust.edu.cn
Mengping Yang: kobeshegu@gmail.com

BibTeX

@article{zhang2024attention,
  title={Attention Calibration for Disentangled Text-to-Image Personalization},
  author={Zhang, Yanbing and Yang, Mengping and Zhou, Qin and Wang, Zhe},
  journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

Acknowledgement

Our code is built upon the excellent codebase of Custom-Diffusion, we thank a lot for their work. We also kindly refer interesting researchers to these wonderful relted works:

We also thank the anonymous reviewers for their valuable suggestions during the rebuttal, which greatly help us improve the paper.

Disclaimer

This project is released for academic use. We disclaim responsibility for user-generated content. Users are solely liable for their actions. The project contributors are not legally affiliated with, nor accountable for, users' behaviors. Use the generative model responsibly, adhering to ethical and legal standards.

About

[CVPR`2024, Oral] Attention Calibration for Disentangled Text-to-Image Personalization

License:MIT License


Languages

Language:Python 96.8%Language:Shell 3.2%