EnzymeAD / Enzyme

High-performance automatic differentiation of LLVM and MLIR.

Home Page:https://enzyme.mit.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to activate optimization option up to O0 on the CUDA GPU test case

bemichel opened this issue · comments

Hi,

Thanks to the doc and your help, i was able to setting up a representative test on GPU CUDA with Enzyme both on forward and backward mode : https://fwd.gymni.ch/TWC7tS

I use clang-14+Enzyme-0.0.81+CUDA-11.2 and the results seem good :

$> clang++ -DENABLE_ENZYME -I${CUDAPATH}/include test.cu -fplugin=${ENZYMEPATH}/lib/ClangEnzyme-14.so --cuda-gpu-arch=sm_61 -lcudart -L${CUDAPATH}/11.2/lib64
$> ./a.out
[GPU, direct] a[0]         == 12.000000                                                                                     
[GPU, direct] a[nb_cell-1] == 12.000000                                                                                     
[GPU, direct] b[0]         == 437.000000                                                                                    
[GPU, direct] b[nb_cell-1] == 437.000000
[GPU, forward] da[0]         == 1.000000
[GPU, forward] da[nb_cell-1] == 1.000000
[GPU, forward] db[0]         == 72.000000
[GPU, forward] db[nb_cell-1] == 72.000000
[GPU, backward] da[0]         == 72.000000
[GPU, backward] da[nb_cell-1] == 72.000000
[GPU, backward] db[0]         == 0.000000
[GPU, backward] db[nb_cell-1] == 0.000000

But if i try the same compilation step with -0[123], Enzyme fails :

$> clang++ -O1 -DENABLE_ENZYME -I${CUDAPATH}/include test.cu -fplugin=${ENZYMEPATH}/lib/ClangEnzyme-14.so --cuda-gpu-arch=sm_61 -lcudart -L${CUDAPATH}/11.2/lib64
clang-14: /.../gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/include/llvm/Support/Casting.h:90: static bool llvm::isa_impl_cl<To, From*>::doit(const From*) [with To = llvm::ConstantAsMetadata; From = llvm::Metadata]: Assertion `Val && "isa<> used on a null pointer"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14 -cc1 -triple x86_64-unknown-linux-gnu -target-sdk-version=11.2 -aux-triple nvptx64-nvidia-cuda -emit-obj --mrelax-relocations -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name test_jambon.cu -mrelocation-model static -mframe-pointer=none -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu x86-64 -tune-cpu generic -mllvm -treat-scalable-fixed-error-as-warning -debugger-tuning=gdb -fcoverage-compilation-dir=directory/test_enzyme -resource-dir /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6 -internal-isystem /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6/include/cuda_wrappers -include __clang_cuda_runtime_wrapper.h -D ENABLE_ENZYME -I /directory/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/cuda/11.2/include -I /directory/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/math_libs/11.2/targets/x86_64-linux/include -I/directory/hwloc/2.4.1-gnu831-hpc/include -I/directory/openmpi/4.0.5-gnu831-hpc/include -I/directory/intel/oneapi/mkl/2021.2.0/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /directory/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/cuda/11.2/include -O1 -fdeprecated-macro -fdebug-compilation-dir=directory/test_enzyme -ferror-limit 19 -fgnuc-version=4.2.1 -fcxx-exceptions -fexceptions -fcolor-diagnostics -load /directory/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so -fcuda-include-gpubinary /tmp/test_jambon-1b521c.fatbin -cuid=7e7be506b6b6c538 -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/test_jambon-3c93a8.o -x cuda test_jambon.cu
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x000000000333bb6f PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x0000000003338ebe SignalHandler(int) Signals.cpp:0:0
 #2 0x00001492f34f2b30 __restore_rt sigaction.c:0:0
 #3 0x00001492f1f1e84f raise (/lib64/libc.so.6+0x3784f)
 #4 0x00001492f1f08c45 abort (/lib64/libc.so.6+0x21c45)
 #5 0x00001492f1f08b19 _nl_load_domain.cold.0 loadmsgcat.c:0:0
 #6 0x00001492f1f16e36 .annobin___GI___assert_fail.end assert.c:0:0
 #7 0x00001492f1831923 (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so+0x18f923)
 #8 0x00001492f1875f8b EnzymeLogic::CreateForwardDiff(llvm::Function*, DIFFE_TYPE, llvm::ArrayRef<DIFFE_TYPE>, TypeAnalysis&, bool, DerivativeMode, bool, unsigned int, llvm::Type*, FnTypeInfo const&, std::vector<bool, std::allocator<bool> >, AugmentedReturn const*, bool) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so+0x1d3f8b)
 #9 0x00001492f180d8c4 (anonymous namespace)::EnzymeBase::HandleAutoDiff(llvm::Instruction*, unsigned int, llvm::Value*, llvm::Type*, llvm::SmallVectorImpl<llvm::Value*>&, std::map<int, llvm::Type*, std::less<int>, std::allocator<std::pair<int const, llvm::Type*> > > const&, std::vector<DIFFE_TYPE, std::allocator<DIFFE_TYPE> > const&, llvm::Function*, DerivativeMode, (anonymous namespace)::EnzymeBase::Options&, bool) Enzyme.cpp:0:0
#10 0x00001492f180fd30 (anonymous namespace)::EnzymeBase::HandleAutoDiffArguments(llvm::CallInst*, DerivativeMode, bool) Enzyme.cpp:0:0
#11 0x00001492f1812a35 (anonymous namespace)::EnzymeBase::lowerEnzymeCalls(llvm::Function&, std::set<llvm::Function*, std::less<llvm::Function*>, std::allocator<llvm::Function*> >&) Enzyme.cpp:0:0
#12 0x00001492f18166c6 (anonymous namespace)::EnzymeBase::run(llvm::Module&) Enzyme.cpp:0:0
#13 0x00001492f182fd8e llvm::detail::PassModel<llvm::Module, EnzymeNewPM, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so+0x18dd8e)
#14 0x0000000002afb0a9 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x2afb0a9)
#15 0x0000000003648736 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile> >&) (.constprop.902) BackendUtil.cpp:0:0
#16 0x000000000364a7b3 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x364a7b3)
#17 0x00000000042cc38d clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x42cc38d)
#18 0x0000000003cdba38 clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3cdba38)
#19 0x00000000050981c9 clang::ParseAST(clang::Sema&, bool, bool) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x50981c9)
#20 0x00000000042cc6e2 clang::CodeGenAction::ExecuteAction() (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x42cc6e2)
#21 0x0000000003ca9231 clang::FrontendAction::Execute() (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3ca9231)
#22 0x0000000003c3b35a clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3c3b35a)
#23 0x0000000003d6ef01 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3d6ef01)
#24 0x0000000000ed78c4 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0xed78c4)
#25 0x0000000000ed5079 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0
#26 0x0000000000e08bbc main (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0xe08bbc)
#27 0x00001492f1f0a803 __libc_start_main (/lib64/libc.so.6+0x23803)
#28 0x0000000000ed395e _start (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0xed395e)
clang-14: error: unable to execute command: Aborted (core dumped)
clang-14: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 14.0.6
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin
clang-14: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-14: note: diagnostic msg: /tmp/test_jambon-69a59d.cu
clang-14: note: diagnostic msg: /tmp/test_jambon-4dc09a/test_jambon-sm_61.cu
clang-14: note: diagnostic msg: /tmp/test_jambon-69a59d.sh
clang-14: note: diagnostic msg: 

********************

Is my code can not be compiled with upper optimization option than -O0 ?

Or as it seems explained in the documentation (https://enzyme.mit.edu/getting_started/CUDAGuide/#cuda-example) :

Note that this procedure (using ClangEnzyme as opposed to LLVMEnzyme manually) inserts Enzyme at a specific locaton in LLVM’s
optimization pipeline. The default ordering should be reasonable, however, the precise ordering of optimization passes may
 [impact performance](https://proceedings.mlsys.org/paper/2020/file/4e732ced3463d06de0ca9a15b6153677-Paper.pdf) .
 If there is a performance issue that you suspect may be due to optimization ordering, please
 [open an issue](https://github.com/EnzymeAD/Enzyme/issues/new) .

Is there another way to do the compilation/differentiation phase to be able to activate -O[123] option ?

Thanks for your help,

Hi,
I try to compile/differentiate with the Enzyme version v0.0.99 (clang-16+Enzyme-0.0.99+CUDA-11.2) :

$> clang --version
clang version 16.0.6
...
$> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver                                                                                               
Copyright (c) 2005-2021 NVIDIA Corporation                                                                                          
Built on Thu_Jan_28_19:32:09_PST_2021
Cuda compilation tools, release 11.2, V11.2.142
Build cuda_11.2.r11.2/compiler.29558016_0
$> module show enzyme/0.0.99-gcc-12.1.0-...

It works fine with option -O0 :

$> clang++ -O0 -DENABLE_ENZYME -I${CUDAPATH}/include test.cu -fplugin=${ENZYMEPATH}/lib/ClangEnzyme-16.so --cuda-gpu-arch=sm_61 -lcudart -L${CUDAPATH}/11.2/lib64
$> ./a.out                                                                               
argc == 1
[GPU, direct] a[0]         == 12.000000
[GPU, direct] a[nb_cell-1] == 12.000000
[GPU, direct] b[0]         == 437.000000
[GPU, direct] b[nb_cell-1] == 437.000000
[GPU, forward] da[0]         == 1.000000
[GPU, forward] da[nb_cell-1] == 1.000000
[GPU, forward] db[0]         == 72.000000
[GPU, forward] db[nb_cell-1] == 72.000000
[GPU, backward] da[0]         == 72.000000
[GPU, backward] da[nb_cell-1] == 72.000000
[GPU, backward] db[0]         == 0.000000
[GPU, backward] db[nb_cell-1] == 0.000000

But with the option -O1 it seems return the equivalent error :

$> clang++ -O1 -DENABLE_ENZYME -I${CUDAPATH}/include test.cu -fplugin=${ENZYMEPATH}/lib/ClangEnzyme-16.so --cuda-gpu-arch=sm_61 -lcudart -L${CUDAPATH}/11.2/lib64
...
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x000000000386282b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x386282b)
 #1 0x000000000385fccb SignalHandler(int) Signals.cpp:0:0
 #2 0x00001535e64e6b30 __restore_rt sigaction.c:0:0
 #3 0x00001535e4f1284f raise (/lib64/libc.so.6+0x3784f)
 #4 0x00001535e4efcc45 abort (/lib64/libc.so.6+0x21c45)
 #5 0x00001535e4efcb19 _nl_load_domain.cold.0 loadmsgcat.c:0:0
 #6 0x00001535e4f0ae36 .annobin___GI___assert_fail.end assert.c:0:0
 #7 0x00001535e4974b33 (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/enzyme-0.0.99-52vilpu4js7wisrre2fgjny6gq3o7ut5/lib/ClangEnzyme-16.so+0x365b33)
 #8 0x00001535e49a0437 EnzymeLogic::CreateForwardDiff(RequestContext, llvm::Function*, DIFFE_TYPE, llvm::ArrayRef<DIFFE_TYPE>, TypeAnalysis&, bool, DerivativeMode, bool, unsigned int, llvm::Type*, FnTypeInfo const&, std::vector<bool, std::allocator<bool>>, AugmentedReturn const*, bool) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/enzyme-0.0.99-52vilpu4js7wisrre2fgjny6gq3o7ut5/lib/ClangEnzyme-16.so+0x391437)
 #9 0x00001535e494ed11 (anonymous namespace)::EnzymeBase::HandleAutoDiff(llvm::Instruction*, unsigned int, llvm::Value*, llvm::Type*, llvm::SmallVectorImpl<llvm::Value*>&, std::map<int, llvm::Type*, std::less<int>, std::allocator<std::pair<int const, llvm::Type*>>> const&, std::vector<DIFFE_TYPE, std::allocator<DIFFE_TYPE>> const&, llvm::Function*, DerivativeMode, (anonymous namespace)::EnzymeBase::Options&, bool, llvm::SmallVectorImpl<llvm::CallInst*>&) Enzyme.cpp:0:0
#10 0x00001535e495090b (anonymous namespace)::EnzymeBase::HandleAutoDiffArguments(llvm::CallInst*, DerivativeMode, bool, llvm::SmallVectorImpl<llvm::CallInst*>&) Enzyme.cpp:0:0
#11 0x00001535e4957973 (anonymous namespace)::EnzymeBase::lowerEnzymeCalls(llvm::Function&, std::set<llvm::Function*, std::less<llvm::Function*>, std::allocator<llvm::Function*>>&) Enzyme.cpp:0:0
#12 0x00001535e495b0b8 (anonymous namespace)::EnzymeBase::run(llvm::Module&) Enzyme.cpp:0:0
#13 0x00001535e4973880 llvm::detail::PassModel<llvm::Module, EnzymeNewPM, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/enzyme-0.0.99-52vilpu4js7wisrre2fgjny6gq3o7ut5/lib/ClangEnzyme-16.so+0x364880)
#14 0x0000000003138d2d llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x3138d2d)
#15 0x0000000003c19373 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&) BackendUtil.cpp:0:0
#16 0x0000000003c1bd5c clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x3c1bd5c)
#17 0x0000000004a8dd52 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x4a8dd52)
#18 0x00000000043edb68 clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x43edb68)
#19 0x0000000005966575 clang::ParseAST(clang::Sema&, bool, bool) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x5966575)
#20 0x00000000043b3fb1 clang::FrontendAction::Execute() (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x43b3fb1)
#21 0x000000000433a23b clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x433a23b)
#22 0x000000000446f088 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x446f088)
#23 0x0000000001056a10 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x1056a10)
#24 0x000000000105202a ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0
#25 0x00000000010531f0 clang_main(int, char**) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x10531f0)
#26 0x00001535e4efe803 __libc_start_main (/lib64/libc.so.6+0x23803)
#27 0x000000000104d3be _start (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x104d3be)
clang-16: error: unable to execute command: Aborted (core dumped)
clang-16: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 16.0.6
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin
clang-16: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-16: note: diagnostic msg: /tmp/test_jambon-e9590d.cu
clang-16: note: diagnostic msg: /tmp/test_jambon-e6f392/test_jambon-sm_61.cu
clang-16: note: diagnostic msg: /tmp/test_jambon-e9590d.sh
clang-16: note: diagnostic msg: 

********************

Do you have any idea what's wrong ???

Can you paste the full log?

The full message:

$> clang++ -O1 -DENABLE_ENZYME -I/opt/tools/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/cuda/11.2/include -I/opt/tools/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/math_libs/11.2/targets/x86_64-linux/include test_jambon.cu -fplugin=/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/enzyme-0.0.99-52vilpu4js7wisrre2fgjny6gq3o7ut5/lib/ClangEnzyme-16.so --cuda-gpu-arch=sm_61 -lcudart -L/opt/tools/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/cuda/11.2/lib64
clang-16: /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/include/llvm/Support/Casting.h:109: static bool llvm::isa_impl_cl<To, const From*>::doit(const From*) [with To = llvm::ConstantAsMetadata; From = llvm::Metadata]: Assertion `Val && "isa<> used on a null pointer"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16 -cc1 -triple x86_64-unknown-linux-gnu -target-sdk-version=11.2 -aux-triple nvptx64-nvidia-cuda -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name test_jambon.cu -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=none -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu x86-64 -tune-cpu generic -mllvm -treat-scalable-fixed-error-as-warning -debugger-tuning=gdb -fcoverage-compilation-dir=/visu/bemichel/dev/SoNICS/dev_doc/test_enzyme -resource-dir /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/lib/clang/16 -internal-isystem /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/lib/clang/16/include/cuda_wrappers -include __clang_cuda_runtime_wrapper.h -D ENABLE_ENZYME -I /opt/tools/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/cuda/11.2/include -I /opt/tools/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/math_libs/11.2/targets/x86_64-linux/include -I/opt/tools/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/math_libs/11.2/include -I/opt/tools/hwloc/2.4.1-gnu831-hpc/include -I/opt/tools/openmpi/4.0.5-gnu831-hpc/include -I/opt/tools/intel/oneapi/tbb/2021.2.0/include -I/opt/tools/intel/oneapi/compiler/2021.2.0/linux/include -I/opt/tools/gcc/10.2.0-gnu831/include -I/scratchm/sonics/opt_el8/linux-rhel8-broadwell/gcc-8.3.1/eigen-3.4.0-xxgaw25zr3gqeeimp5nugzxxxxlzzjfq/include/eigen3 -I/opt/tools/intel/oneapi/mpi/2021.6.0//include -I/opt/tools/intel/oneapi/mkl/2021.2.0/include -I/scratchm/sonics/opt_el8/linux-rhel8-broadwell/gcc-8.3.1/python-3.9.12-enb6bk6hdesnoo6ppwn5jnb3jivt2jcz/include/python3.9 -internal-isystem /opt/tools/gcc/12.1.0-gnu831/lib/gcc/x86_64-pc-linux-gnu/12.1.0/../../../../include/c++/12.1.0 -internal-isystem /opt/tools/gcc/12.1.0-gnu831/lib/gcc/x86_64-pc-linux-gnu/12.1.0/../../../../include/c++/12.1.0/x86_64-pc-linux-gnu -internal-isystem /opt/tools/gcc/12.1.0-gnu831/lib/gcc/x86_64-pc-linux-gnu/12.1.0/../../../../include/c++/12.1.0/backward -internal-isystem /opt/tools/gcc/12.1.0-gnu831/lib/gcc/x86_64-pc-linux-gnu/12.1.0/../../../../include/c++/12.1.0 -internal-isystem /opt/tools/gcc/12.1.0-gnu831/lib/gcc/x86_64-pc-linux-gnu/12.1.0/../../../../include/c++/12.1.0/x86_64-pc-linux-gnu -internal-isystem /opt/tools/gcc/12.1.0-gnu831/lib/gcc/x86_64-pc-linux-gnu/12.1.0/../../../../include/c++/12.1.0/backward -internal-isystem /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/lib/clang/16/include -internal-isystem /usr/local/include -internal-isystem /opt/tools/gcc/12.1.0-gnu831/lib/gcc/x86_64-pc-linux-gnu/12.1.0/../../../../x86_64-pc-linux-gnu/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/lib/clang/16/include -internal-isystem /usr/local/include -internal-isystem /opt/tools/gcc/12.1.0-gnu831/lib/gcc/x86_64-pc-linux-gnu/12.1.0/../../../../x86_64-pc-linux-gnu/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /opt/tools/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/cuda/11.2/include -O1 -fdeprecated-macro -fdebug-compilation-dir=/visu/bemichel/dev/SoNICS/dev_doc/test_enzyme -ferror-limit 19 -fgnuc-version=4.2.1 -fcxx-exceptions -fexceptions -fcolor-diagnostics -load /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/enzyme-0.0.99-52vilpu4js7wisrre2fgjny6gq3o7ut5/lib/ClangEnzyme-16.so -fcuda-include-gpubinary /tmp/test_jambon-1378cf.fatbin -cuid=c77be1b562716e -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/test_jambon-1b2471.o -x cuda test_jambon.cu
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x000000000386282b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x386282b)
 #1 0x000000000385fccb SignalHandler(int) Signals.cpp:0:0
 #2 0x000015320199bb30 __restore_rt sigaction.c:0:0
 #3 0x00001532003c784f raise (/lib64/libc.so.6+0x3784f)
 #4 0x00001532003b1c45 abort (/lib64/libc.so.6+0x21c45)
 #5 0x00001532003b1b19 _nl_load_domain.cold.0 loadmsgcat.c:0:0
 #6 0x00001532003bfe36 .annobin___GI___assert_fail.end assert.c:0:0
 #7 0x00001531ffe29b33 (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/enzyme-0.0.99-52vilpu4js7wisrre2fgjny6gq3o7ut5/lib/ClangEnzyme-16.so+0x365b33)
 #8 0x00001531ffe55437 EnzymeLogic::CreateForwardDiff(RequestContext, llvm::Function*, DIFFE_TYPE, llvm::ArrayRef<DIFFE_TYPE>, TypeAnalysis&, bool, DerivativeMode, bool, unsigned int, llvm::Type*, FnTypeInfo const&, std::vector<bool, std::allocator<bool>>, AugmentedReturn const*, bool) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/enzyme-0.0.99-52vilpu4js7wisrre2fgjny6gq3o7ut5/lib/ClangEnzyme-16.so+0x391437)
 #9 0x00001531ffe03d11 (anonymous namespace)::EnzymeBase::HandleAutoDiff(llvm::Instruction*, unsigned int, llvm::Value*, llvm::Type*, llvm::SmallVectorImpl<llvm::Value*>&, std::map<int, llvm::Type*, std::less<int>, std::allocator<std::pair<int const, llvm::Type*>>> const&, std::vector<DIFFE_TYPE, std::allocator<DIFFE_TYPE>> const&, llvm::Function*, DerivativeMode, (anonymous namespace)::EnzymeBase::Options&, bool, llvm::SmallVectorImpl<llvm::CallInst*>&) Enzyme.cpp:0:0
#10 0x00001531ffe0590b (anonymous namespace)::EnzymeBase::HandleAutoDiffArguments(llvm::CallInst*, DerivativeMode, bool, llvm::SmallVectorImpl<llvm::CallInst*>&) Enzyme.cpp:0:0
#11 0x00001531ffe0c973 (anonymous namespace)::EnzymeBase::lowerEnzymeCalls(llvm::Function&, std::set<llvm::Function*, std::less<llvm::Function*>, std::allocator<llvm::Function*>>&) Enzyme.cpp:0:0
#12 0x00001531ffe100b8 (anonymous namespace)::EnzymeBase::run(llvm::Module&) Enzyme.cpp:0:0
#13 0x00001531ffe28880 llvm::detail::PassModel<llvm::Module, EnzymeNewPM, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/enzyme-0.0.99-52vilpu4js7wisrre2fgjny6gq3o7ut5/lib/ClangEnzyme-16.so+0x364880)
#14 0x0000000003138d2d llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x3138d2d)
#15 0x0000000003c19373 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&) BackendUtil.cpp:0:0
#16 0x0000000003c1bd5c clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x3c1bd5c)
#17 0x0000000004a8dd52 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x4a8dd52)
#18 0x00000000043edb68 clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x43edb68)
#19 0x0000000005966575 clang::ParseAST(clang::Sema&, bool, bool) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x5966575)
#20 0x00000000043b3fb1 clang::FrontendAction::Execute() (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x43b3fb1)
#21 0x000000000433a23b clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x433a23b)
#22 0x000000000446f088 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x446f088)
#23 0x0000000001056a10 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x1056a10)
#24 0x000000000105202a ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0
#25 0x00000000010531f0 clang_main(int, char**) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x10531f0)
#26 0x00001532003b3803 __libc_start_main (/lib64/libc.so.6+0x23803)
#27 0x000000000104d3be _start (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin/clang-16+0x104d3be)
clang-16: error: unable to execute command: Aborted (core dumped)
clang-16: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 16.0.6
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-12.1.0/llvm-16.0.6-wnagksngnyalxvmluiow2yuywyd4npx5/bin
clang-16: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-16: note: diagnostic msg: /tmp/test_jambon-828cba.cu
clang-16: note: diagnostic msg: /tmp/test_jambon-971738/test_jambon-sm_61.cu
clang-16: note: diagnostic msg: /tmp/test_jambon-828cba.sh
clang-16: note: diagnostic msg: 

********************

Okay should be fixed by #1697