Zhendong-Wang / Prompt-Diffusion

Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Test

Jasonhyw opened this issue · comments

What does the a_prompt and n_prompt mean? What are they used for?

'a_prompt' is positive prompt to improve the generation quality of Stable Diffusion (SD), to guide SD to generate high quality images with words. Conversely, 'n_prompt' is the negative prompt to penalize the generation regarding the words in 'n_prompt'.

Thanks for your interesting work. Can you provide the evaluate code and examples for the task of style transfer?

Hi Tao, I provided the evalutation code here, #11 (comment).
Perform style transfer with training only on current six tasks are not going to work well all the time. The style transfer results showing in paper are not random generation results. Could consider improvement here.

Thanks, why the style transfer results shown in the paper are not random generation results?

I also met this problem. When I tried to apply the style transfer task mentioned in the paper, I found that the output seems like a reconstruction of the example image, rather than the style transfer one.
image
and here is my code(btw, I found that the input prompt make no difference in this case)
image

Style transfer significantly diverges from the six tasks explicitly trained in the model, such as segmentation-to-image, depth-to-image, and others. Given the model's specific training on these tasks, its ability to generalize effectively to tasks like style transfer, which are substantially different, is limited. This limitation aligns with common understanding in the field of machine learning, where models often excel in areas closely related to their training data and struggle with tasks that are markedly distinct.

I personally tried some times, found the model can work in some cases and shared them in the paper.

@ZebinHe The style transfer can work on my end in my two shared examples. I am not sure why it didn't in you case.

We also provide a modifed version here https://arxiv.org/abs/2312.01408. ViT based encoder is used to encode the example pairs, and the model is trained on more tasks.

Thanks a lot for your reply.

Do you have a github page for a modified version of the model, iPromptDiff?

Style transfer significantly diverges from the six tasks explicitly trained in the model, such as segmentation-to-image, depth-to-image, and others. Given the model's specific training on these tasks, its ability to generalize effectively to tasks like style transfer, which are substantially different, is limited. This limitation aligns with common understanding in the field of machine learning, where models often excel in areas closely related to their training data and struggle with tasks that are markedly distinct.

I personally tried some times, found the model can work in some cases and shared them in the paper.

@ZebinHe The style transfer can work on my end in my two shared examples. I am not sure why it didn't in you case.

We also provide a modifed version here https://arxiv.org/abs/2312.01408. ViT based encoder is used to encode the example pairs, and the model is trained on more tasks.