NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Home Page:https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect error message when shape is not suitable for fp8 casting

lucifer1004 opened this issue · comments

Version: latest stable

import torch
import transformer_engine.pytorch as te
from transformer_engine.common import recipe

# Set dimensions.
in_features = 767
out_features = 3072
hidden_size = 2048

# Initialize model and inputs.
model = te.Linear(in_features, out_features, bias=True)
inp = torch.randn(hidden_size, in_features, device="cuda")

# Create an FP8 recipe. Note: All input args are optional.
fp8_recipe = recipe.DelayedScaling(
    margin=0, interval=1, fp8_format=recipe.Format.E4M3)

# Enable autocasting for the forward pass
with te.fp8_autocast(enabled=True, fp8_recipe=fp8_recipe):
    out = model(inp)

loss = out.sum()
loss.backward()

The error message is

AssertionError: Tensor dimensions are not compatible for FP8 execution: (2048 % 8 != 0, 767 % 16 != 0)

But it is obvious that 2048 % 8 == 0.

def assert_dim_for_fp8_exec(tensor: torch.Tensor) -> None:
"""For fp8 fprop (TN layout), inputs and weights must be such
that dim0 is divisible by 8 and dim1 is divisible by 16.
"""
# single tensor check so it's clear which tensor is triggering the assertion
assert check_dim_for_fp8_exec(tensor), (
"Tensor dimensions are not compatible for FP8 execution: "
f"({tensor.shape[0]} % 8 != 0, {tensor.shape[1]} % 16 != 0)"
)

This function should be improved, as the comment says.

767 % 16 != 0. We should clarify that both conditions are required.