NVIDIA / TransformerEngine

Version: latest stable

import torch
import transformer_engine.pytorch as te
from transformer_engine.common import recipe

# Set dimensions.
in_features = 767
out_features = 3072
hidden_size = 2048

# Initialize model and inputs.
model = te.Linear(in_features, out_features, bias=True)
inp = torch.randn(hidden_size, in_features, device="cuda")

# Create an FP8 recipe. Note: All input args are optional.
fp8_recipe = recipe.DelayedScaling(
    margin=0, interval=1, fp8_format=recipe.Format.E4M3)

# Enable autocasting for the forward pass
with te.fp8_autocast(enabled=True, fp8_recipe=fp8_recipe):
    out = model(inp)

loss = out.sum()
loss.backward()

The error message is

AssertionError: Tensor dimensions are not compatible for FP8 execution: (2048 % 8 != 0, 767 % 16 != 0)

But it is obvious that 2048 % 8 == 0.

TransformerEngine/transformer_engine/pytorch/utils.py

Lines 216 to 224 in b8eea8a

    
           def assert_dim_for_fp8_exec(tensor: torch.Tensor) -> None: 
        
               """For fp8 fprop (TN layout), inputs and weights must be such 
        
                  that dim0 is divisible by 8 and dim1 is divisible by 16. 
        
               """ 
        
               # single tensor check so it's clear which tensor is triggering the assertion 
        
               assert check_dim_for_fp8_exec(tensor), ( 
        
                   "Tensor dimensions are not compatible for FP8 execution: " 
        
                   f"({tensor.shape[0]} % 8 != 0, {tensor.shape[1]} % 16 != 0)" 
        
               )

This function should be improved, as the comment says.

767 % 16 != 0. We should clarify that both conditions are required.

	def assert_dim_for_fp8_exec(tensor: torch.Tensor) -> None:
	"""For fp8 fprop (TN layout), inputs and weights must be such
	that dim0 is divisible by 8 and dim1 is divisible by 16.
	"""
	# single tensor check so it's clear which tensor is triggering the assertion
	assert check_dim_for_fp8_exec(tensor), (
	"Tensor dimensions are not compatible for FP8 execution: "
	f"({tensor.shape[0]} % 8 != 0, {tensor.shape[1]} % 16 != 0)"
	)

Incorrect error message when shape is not suitable for fp8 casting