Differences between code and paper

Question

Differences between code and paper

thorinf opened this issue a year ago · comments

Hi,

I've emailed Yang Song about differences between paper and code, but I thought I'd raise it as an issue so others can see. I'll update this issue if I get a reply via email.

There are some differences between the paper and the code, and I was hoping to know which is the better approach.

[Solved?] The Karras Rho scheduling is a bit different in the paper, where the code is the same as the EDM implementation. This one I think is explainable since they are just the reverse of each other - this may be why tn and tn+1 are switched in the code.
The input for the denoiser is scaled in the code, but not in the paper i.e. c_in.
Time rescaling is multiplied by a factor of 1000. My first thoughts on this is that it may be because the Temporal Embedding in the model prefers larger floats, e.g. Sinusoidal PE - but this is unconfirmed.

Any advice would be really appreciated, thank you.

sharkDDD · Answer 1 · Sun Apr 23 2023 16:45:30 GMT+0800 (China Standard Time)

Hi,

I've emailed Yang Song about differences between paper and code, but I thought I'd raise it as an issue so others can see. I'll update this issue if I get a reply via email.

There are some differences between the paper and the code, and I was hoping to know which is the better approach.

[Solved?] The Karras Rho scheduling is a bit different in the paper, where the code is the same as the EDM implementation. This one I think is explainable since they are just the reverse of each other - this may be why tn and tn+1 are switched in the code.

The input for the denoiser is scaled in the code, but not in the paper i.e. c_in.

Time rescaling is multiplied by a factor of 1000. My first thoughts on this is that it may be because the Temporal Embedding in the model prefers larger floats, e.g. Sinusoidal PE - but this is completely unconfirmed.

Any advice would be really appreciated, thank you.

I found some differences between paper and code as follows:
1: The function (6) in the paper means that the x_tn is calculated by x_t_n+1, but the implementation in karras_diffusion.py are not, exchange the x_t2 and x_t may be correct.
2: Also in karras_diffusion.py, the euler_solver does not utilize the score function which is mentioned in the paper.

Yuanzhi Zhu · Answer 2 · Sun Apr 23 2023 18:42:55 GMT+0800 (China Standard Time)

Hi @sharkDDD
The first question is mentioned in #12 (comment)
Indeed, the t schedule is calculated in a reversed order (https://github.com/openai/consistency_models/blob/main/cm/karras_diffusion.py#L178) compared to the paper, which makes it work out correctly in the end.
For question 2, you need to know the relation between score function and a denoiser (e.g https://twitter.com/iScienceLuvr/status/1592860080151891969)
best,