NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Home Page:https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some doubts about the usage of `DelayedScaling.interval`.

wzzju opened this issue · comments

Is the interval attribute of DelayedScaling not used in PyTorch within the current TransformerEngine? In other words, does the value of DelayedScaling.interval affect the computation frequency of the scaling factor in PyTorch? I have carefully reviewed the source code of TransformerEngine and didn't find any usage of DelayedScaling.interval in PyTorch to control the computation frequency of the scaling factor.