llama.cpp Integration to Support Low-End Hardware Compatibility
efelem opened this issue · comments
Request for llama.cpp Integration to Support Low-End Hardware Compatibility
Description
I'm currently trying to integrate llama.cpp
with Meditron for running models on lower-end hardware. Meditron is based on Llama, so in theory, this should be possible. However, I'm encountering issues when attempting to convert the Meditron model using llama.cpp
.
Steps to Reproduce
-
Either run
python3 convert-hf-to-gguf.py ../meditron-7b/
- Output:
Loading model: meditron-7b Traceback (most recent call last): ... NotImplementedError: Architecture "LlamaForCausalLM" not supported!
- Output:
-
Or directly launching with
llama.cpp
using:./build/bin/main --rope-freq-scale 8.0 -m ../meditron-7b/pytorch_model-00008-of-00008.bin -p "I have pain in my leg from toes to hip"
- Output:
Log start ... error loading model: llama_model_loader: failed to load model from ../meditron-7b/pytorch_model-00008-of-00008.bin
- Output:
Expected Behavior
Successful integration of llama.cpp
with Meditron, allowing the model to run on lower-end hardware.
Actual Behavior
Encountering a NotImplementedError
for the architecture "LlamaForCausalLM" when trying to convert the model, and an error loading the model when launching directly with llama.cpp
.
Possible Solution
Adjustments in llama.cpp
to support the "LlamaForCausalLM" architecture used by Meditron. This could involve modifying the model conversion script or the model loading mechanism in llama.cpp
.
Additional Context
Request
I kindly request the team to consider adding support for llama.cpp
integration with Meditron. Or to give advices on how to implement it. This would be a significant enhancement, enabling the use of Meditron models on more diverse hardware setups, especially those at the lower end.
related: did you try these quantized models also?
https://huggingface.co/TheBloke/meditron-70B-GGUF
https://huggingface.co/TheBloke/meditron-7B-GGUF