how to disable fused_attention when building?

Question

how to disable fused_attention when building?

janelu9 opened this issue 2 months ago · comments

 [4/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
      FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
      /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-9h34s3cg/transformer_engine -I/tmp/pip-req-build-9h34s3cg/transformer_engine/common/include -I/tmp/pip-req-build-9h34s3cg/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-9h34s3cg/build/cmake/common/string_headers -isystem /usr/local/cuda/targets/x86_64-linux/include --threads 4 --expt-relaxed-constexpr -O3 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o.d -x cu -c /tmp/pip-req-build-9h34s3cg/transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
      Killed

my cuda version is 11.8

Przemyslaw Tredak · Answer 1 · Tue Apr 30 2024 04:24:16 GMT+0800 (China Standard Time)

Hmm, I don't think the issue you are seeing is actually due to the fused attention itself. Could you try changing this line: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/CMakeLists.txt#L17 from 4 threads to 1 (so it should have --threads 1 instead of --threads 4 in it) and then compiling again?

janelu9 · Answer 2 · Sun May 05 2024 09:26:43 GMT+0800 (China Standard Time)

Hmm, I don't think the issue you are seeing is actually due to the fused attention itself. Could you try changing this line: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/CMakeLists.txt#L17 from 4 threads to 1 (so it should have --threads 1 instead of --threads 4 in it) and then compiling again?

failed again in wsl2, could you share me a wheel package of torch2.1+cu118?

janelu9 · Answer 3 · Mon May 06 2024 15:42:39 GMT+0800 (China Standard Time)

Hmm, I don't think the issue you are seeing is actually due to the fused attention itself. Could you try changing this line: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/CMakeLists.txt#L17 from 4 threads to 1 (so it should have --threads 1 instead of --threads 4 in it) and then compiling again?

Well, I compiled successfully after setting MAX_JOB=1 and downgrading flash-attn ==2.4.2 , thanks!