NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Home Page:https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can CUDA 12.1.1 really be used for compilation?

leizhao1234 opened this issue · comments

I use cuda 12.1.1 to build TE form source, stable、main and v1.3 branch, all of them can install successfully, but flash-attention installed by TE doesn’t work at all.
import flash_attn_2_cuda as flash_attn_cuda ImportError: /share/home/zl/miniconda/envs/mega/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

The undefined symbol actually comes from pyTorch, the CUDA version seems unrelated.

@ptrblck Do you have some recommendations on debugging this symbol issue?

It should have nothing to do with PyTorch. If I compile FlashAttention directly from the source code, there’s no issue, but when I install FlashAttention through TE, I encounter undefined symbols.

This is related to the version of flash-attn. TE currently forces flash-attn<=2.4.2 and there seems to be some issue with v2.4.2. Installing from source or installing v2.5.5 can help.

See #689