jpthu17 / DiffusionRet

[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about diffusion model

lmdsx opened this issue · comments

commented

Thanks for your excellent work, but I have a question.

p = self.decoder(emb).squeeze(2)  
p += weight

why did you add the weight to this distribution? p should contain this information. I don't understand the meaning of this step.

Thank you for raising the issue. This is an interesting phenomenon that we observed when we were doing experiments. Specifically, we find that model training is much more stable after adding this residual connection.

You can try to remove this residual connection, but it will reduce the performance of the model.

commented

Thank you for your answer. Actually, after reading your paper, I am very interested in the denoising structure. In addition to the previous question, there are two more corresponding questions. 1. Why is the time-position 't' placed on the second attention instead of the first frame attention? 2. Why is the subsequent video feature concatenated with the previous video representation? Is it an empirical setting or experimental verification? Is the second question because the second attention reduces the fine-grained information of the frame provided by the text?

If you can help answer these questions, I would be very grateful.

In fact, the structure of our denoising network is empirical.

In response to the first question, we have not tried the structure you mentioned. We think your idea is enlightening because placing the time position embedding in the first attention allows the model to focus on different frames at different time steps.

In response to the second question, our experience shows that concatenating all frames and text together when using contrastive loss does not work well. However, we have not tried to concatenate all the frames and text together in the diffusion models, so your idea is probably better.

commented

Thank you very much for your response. Your open-source work has been immensely helpful to me. I am looking forward to your future work.