NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Home Page:https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to disable fused_attention when building?

janelu9 opened this issue · comments

 [4/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
      FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
      /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-9h34s3cg/transformer_engine -I/tmp/pip-req-build-9h34s3cg/transformer_engine/common/include -I/tmp/pip-req-build-9h34s3cg/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-9h34s3cg/build/cmake/common/string_headers -isystem /usr/local/cuda/targets/x86_64-linux/include --threads 4 --expt-relaxed-constexpr -O3 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o.d -x cu -c /tmp/pip-req-build-9h34s3cg/transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
      Killed

my cuda version is 11.8

Hmm, I don't think the issue you are seeing is actually due to the fused attention itself. Could you try changing this line: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/CMakeLists.txt#L17 from 4 threads to 1 (so it should have --threads 1 instead of --threads 4 in it) and then compiling again?

Hmm, I don't think the issue you are seeing is actually due to the fused attention itself. Could you try changing this line: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/CMakeLists.txt#L17 from 4 threads to 1 (so it should have --threads 1 instead of --threads 4 in it) and then compiling again?

failed again in wsl2, could you share me a wheel package of torch2.1+cu118?

Hmm, I don't think the issue you are seeing is actually due to the fused attention itself. Could you try changing this line: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/CMakeLists.txt#L17 from 4 threads to 1 (so it should have --threads 1 instead of --threads 4 in it) and then compiling again?

Well, I compiled successfully after setting MAX_JOB=1 and downgrading flash-attn ==2.4.2 , thanks!