about "JIT multiple training steps together"

Question

about "JIT multiple training steps together"

ShiZiqiang opened this issue 3 years ago · comments

Hello, Dr. Song

Thank you for sharing this excellent work.
I saw that a parameter "n_jitted_steps" was used in the training, and the comment of the code said: "JIT multiple training steps together for faster training." Can you explain why and how to conduct this "JIT multiple training steps together"? Does this "n_jitted_steps" affect performance, that is, if I don't use this "JIT multiple training steps together", will the performance be the same?
Thank you in advance.

Yang Song · Answer 1 · Tue Mar 16 2021 17:06:12 GMT+0800 (China Standard Time)

This n_jitted_steps doesn't affect sample quality or likelihoods. No matter what n_jitted_steps you set, you are running exactly the same training procedure. Specifically, you are jit-compiling multiple training steps to execute them together on GPUs/TPUs, and the number of training steps to jit together is given by n_jitted_steps. A larger n_jitted_steps can make training faster at the cost of more memory usage.

ShiZiqiang · Answer 2 · Wed Mar 17 2021 10:46:16 GMT+0800 (China Standard Time)

Hi, Dr. Song, Thank you so much for the clear explanation. Totally understood.