YingchaojieFeng / PromptMagician

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation

Paper | Authors: Yingchaojie Feng, Xingbo Wang, Kam Kwai Wong, Sijia Wang, Yuhong Lu, Minfeng Zhu, Baicheng Wang, Wei Chen

Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, developing effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords. To facilitate interactive prompt refinement, PromptMagician introduces a multi-level visualization for the cross-modal embedding of the retrieved images and recommended keywords, and supports users in specifying multiple criteria for personalized exploration. Two usage scenarios, a user study, and expert interviews demonstrate the effectiveness and usability of our system, suggesting it facilitates prompt engineering and improves the creativity support of the generative text-to-image model.

How to run the system

The environment setups include frontend (react 18.2.0, d3 7.8.2), and backend (python 3.7 or above).

  1. Install Python packages (suggest using conda for package management):
cd back-end
pip install -r requirements.txt
  1. Download DiffusionDB 2m_first_100k (we use DiffusionDB as image retrieval database).
python /diffusionDB/download.py
  1. Download pre-processed data (for DiffusionDB 2m_first_100k and GPU environments) and move the folds to back-end/.cache directory. You can also create your own version by referring to the workflow.py.

  2. set up backend (configure config.py and run_sd.sh first, we use 8 GPUs by default).

cd server
sh run_sd.sh
python server.py
  1. set up frontend.
cd front-end
npm install
npm start

How to cite

If this paper and tool helps your research projects, please considering citing our paper:

@article{feng2023promptmagician,
  title={PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation},
  author={Feng, Yingchaojie and Wang, Xingbo and Wong, Kam Kwai and Wang, Sijia and Lu, Yuhong and Zhu, Minfeng and Wang, Baicheng and Chen, Wei},
  journal={IEEE Transactions on Visualization and Computer Graphics},
  volume={30},
  number={1},
  pages={295--305},
  year={2024},
  doi={10.1109/TVCG.2023.3327168}
}

About


Languages

Language:TypeScript 62.2%Language:Python 20.5%Language:SCSS 15.7%Language:HTML 1.2%Language:CSS 0.3%Language:Shell 0.2%