Can CUDA 12.1.1 really be used for compilation?

Question

Can CUDA 12.1.1 really be used for compilation?

leizhao1234 opened this issue 4 months ago · comments

I use cuda 12.1.1 to build TE form source, stable、main and v1.3 branch, all of them can install successfully, but flash-attention installed by TE doesn’t work at all.
import flash_attn_2_cuda as flash_attn_cuda ImportError: /share/home/zl/miniconda/envs/mega/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

Przemyslaw Tredak · Answer 1 · Fri Mar 01 2024 06:04:30 GMT+0800 (China Standard Time)

The undefined symbol actually comes from pyTorch, the CUDA version seems unrelated.

@ptrblck Do you have some recommendations on debugging this symbol issue?

leizhao1234 · Answer 2 · Fri Mar 01 2024 11:32:01 GMT+0800 (China Standard Time)

It should have nothing to do with PyTorch. If I compile FlashAttention directly from the source code, there’s no issue, but when I install FlashAttention through TE, I encounter undefined symbols.

Gabriel Wu · Answer 3 · Fri Mar 01 2024 14:52:52 GMT+0800 (China Standard Time)

This is related to the version of flash-attn. TE currently forces flash-attn<=2.4.2 and there seems to be some issue with v2.4.2. Installing from source or installing v2.5.5 can help.

See #689