Wrong struct size on llm-chain-llama-sys when compiling with cuda support, causes segfault on FFI call
andychenbruce opened this issue · comments
If I have
llm-chain-llama-sys ="0.12.3"
to Cargo.toml
it runs fine, but if I have
llm-chain-llama-sys = { version = "0.12.3", features = ["cuda"] }
causes the program to segfault.
I tracked down where it was by adding .arg("-DCMAKE_BUILD_TYPE=Debug")
to llm-chain-llama-sys/build.rs:84
to tell Cmake to add debug symbols to the llama.cpp
. Then stepping through the program in gdb
I found it causes the segfault (valgrind says it tries to Jump to the invalid address stated on the next line 0x0: ???
) at what I believe to be the first FFI the program calls.
My rust code calls llm_chain_llama::Executor::new_with_options(options)
, which eventually goes to the line llm-chain-llama/src/context.rs:42
which is an unsafe
block to the FFI function llama_context_default_params
which starts on llama.cpp/llama.cpp:864
.
When gdb
enters llama_context_default_params
, running bt
shows a correct back trace leading back to the rust program. After stepping over the struct initialization, now bt
shows that the rust program tries to return to 0x00000000
. I assume its because the stack frame is getting messed up. The c++ function llama_context_default_params
just returns a struct so the struct size is probably wrong .
I think I found the problem, before the struct initialization on llama.cpp:864
if I add a line
printf("HELLO FROM CPP ASDFASDF struct size is %ld\n", sizeof(llama_context_params));
and then in rust add the line
eprintln!("HELLO FROM RUST struct size = {}", std::mem::size_of::<llm_chain__llama_sys::llama_context_params>()));
let executor = llm_chain::llama::Executor::new_with_options(options)?;
eprintln!("GOT PAST FUNCTION");
If I dont enable features = ["cuda"]
they print the same size of 48
.
HELLO FROM RUST struct size = 48
HELLO FROM CPP ASDFASDF struct size is 48
GOT PAST FUNCTION
... program runs fine
but if do have features = ["cuda"]
HELLO FROM RUST struct size = 48
HELLO FROM CPP ASDFASDF struct size is 112
Segmentation fault (core dumped)
which I think is the problem.
The struct llama_context_params
is defined in llama.h:74
, and I think the problem is that it has an member float tensor_split[LLAMA_MAX_DEVICES]
.
When cuda is enabled the build.rs:88
passes in the build flag -DLLAMA_CUBLAS=ON
which on llama.h:5
some preprocessor ifdef
s changes the value of LLAMA_MAX_DEVICES
which messes with the size of the struct which messes up the cpp to rust bindings where the stack gets messed up and causes a segfault. I think it has something to do with bindgen handling preprocessor stuff.
I have no idea how bindgen works but on llm-chain-llama-sys/src/bindings.rs:763
it has
const LLAMA_MAX_DEVICES: usize = 1;
and then 10 lines after on line 773
it has
pub struct llama_context_params {
...
pub tensor_split: [::std::os::row::c_floag; LLAM_MAX_DEVICES],
...
}
and yes the spelling mistake is intentional, LLAM_MAX_DEVICES
is missing an A
. You can replace LLAM_MAX_DEVICES
with asdfasdf
and it will still compile so I don't understand where it is getting the symbol from, but regardless I think bindings.rs:773
should have LLAMA_MAX_DEVICES
not LLAM_MAX_DEVICES
and I don't see how that isn't a compile error right now.
EDIT: NVM it looks like I can delete bindings.rs
and it still works? I think bindgen regenerates it or something. It still sets LLAMA_MAX_DEVICES
to 1
instead of correctly to 16
.
EDIT2: NVM again, it looks like build.rs only uses the bindings.rs
if bindgen fails to generate the bindings.
EDIT3: llama.cpp
s makefile sends -DGGML_USE_CUBLAS
as a flag, if you put .clang_arg("-DGGML_USE_CUBLAS")
into build.rs:36
it actuall sets the correct array size but it causes some compile errors in llm-chain-llama
cause now the array is mismatched lengths, 16 instead of 1.
Ok I think I fixed it.
If the cuda feature is enabled then the build.rs
needs to pass .clang_arg("-DGGML_USE_CUBLAS")
into bindgens libclang thingy That will hopefully generate the correct bindings.
Then in llm-chain-llama/src/context.rs:23
, the hardcoded
const LLAMA_MAX_DEVICES: usize = 1;
Should be changed to use the bindings value. One way is to do
const LLAMA_MAX_DEVICES: usize = llm_chain_llama_sys::LLAMA_MAX_DEVICES as usize;
Then it actually compiles. Currently I don't see any way to actually tell it to use the GPU with the options since it doesn't have a Opt
to set num_gpu_layers
but at least it doesn't segfault anymore.
I'll try to submit a pull request rn.
Edit: #175