Can this model inference/quantize using llama-cpp?
Adawann opened this issue · comments
ZhouShengJie commented
rt
Alwaysloser commented
error loading model: create_tensor: tensor 'output_norm.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './models/Xwin-LM-70B-V0.1/ggml-model-f32.gguf'
main: error: unable to load model
cannot run infer :(
asfandsaleem commented
this seems to be your specific implementation problem. I tried working with Xwin LMs and they work great. I just used LM studio
Louis commented
@asfandsaleem how did you get it to work on lm studio? I tried using the Vicuna preset without success