how to disable fused_attention when building?
janelu9 opened this issue · comments
[4/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
/usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-9h34s3cg/transformer_engine -I/tmp/pip-req-build-9h34s3cg/transformer_engine/common/include -I/tmp/pip-req-build-9h34s3cg/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-9h34s3cg/build/cmake/common/string_headers -isystem /usr/local/cuda/targets/x86_64-linux/include --threads 4 --expt-relaxed-constexpr -O3 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o.d -x cu -c /tmp/pip-req-build-9h34s3cg/transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
Killed
my cuda version is 11.8
Hmm, I don't think the issue you are seeing is actually due to the fused attention itself. Could you try changing this line: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/CMakeLists.txt#L17 from 4 threads to 1 (so it should have --threads 1
instead of --threads 4
in it) and then compiling again?
Hmm, I don't think the issue you are seeing is actually due to the fused attention itself. Could you try changing this line: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/CMakeLists.txt#L17 from 4 threads to 1 (so it should have
--threads 1
instead of--threads 4
in it) and then compiling again?
failed again in wsl2, could you share me a wheel package of torch2.1+cu118?
Hmm, I don't think the issue you are seeing is actually due to the fused attention itself. Could you try changing this line: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/CMakeLists.txt#L17 from 4 threads to 1 (so it should have
--threads 1
instead of--threads 4
in it) and then compiling again?
Well, I compiled successfully after setting MAX_JOB=1
and downgrading flash-attn ==2.4.2
, thanks!