Some doubts about the usage of `DelayedScaling.interval`.
wzzju opened this issue · comments
Is the interval
attribute of DelayedScaling not used in PyTorch within the current TransformerEngine? In other words, does the value of DelayedScaling.interval
affect the computation frequency of the scaling factor in PyTorch? I have carefully reviewed the source code of TransformerEngine and didn't find any usage of DelayedScaling.interval
in PyTorch to control the computation frequency of the scaling factor.