Some doubts about the usage of `DelayedScaling.interval`.

Question

Some doubts about the usage of `DelayedScaling.interval`.

wzzju opened this issue 2 months ago · comments

Is the interval attribute of DelayedScaling not used in PyTorch within the current TransformerEngine? In other words, does the value of DelayedScaling.interval affect the computation frequency of the scaling factor in PyTorch? I have carefully reviewed the source code of TransformerEngine and didn't find any usage of DelayedScaling.interval in PyTorch to control the computation frequency of the scaling factor.