Support using arbitrary derivative models
ivanbaldo opened this issue · comments
Currently the models need to be specified as llama7b
for example, but what if one wants to use codellama/CodeLlama-7b-hf
or meta-llama/Llama-2-7b-hf
(non chat version), etc.?
A more flexible method should be implemented in the future.
@ivanbaldo, thank you for this idea. Perhaps specifying models via a model ID could be implemented.
This might be easier than the idea I had
I was trying to port support for quantized gguf models from this candle example, but am a bit lost bringing it in:
https://github.com/huggingface/candle/blob/main/candle-examples/examples/quantized/main.rs
might be also an issue to know the base llama model there to set parameters correctly - I don't know if gguf has all the infos you need in model metadata
GGUF would be a great addition! However, I am now working on mistral.rs, the successor to this project: https://github.com/EricLBuehler/mistral.rs
Mistral.rs currently has quantized and normal Mistral models, and may be used with arbitrary derivative models. It provides an openai-compatible server and there is a simple chat example.
Currently the models need to be specified as
llama7b
for example, but what if one wants to usecodellama/CodeLlama-7b-hf
ormeta-llama/Llama-2-7b-hf
(non chat version), etc.? A more flexible method should be implemented in the future.
Please also refer to this PR #46 , it can load arbitrary models under the given model architecture.
@ivanbaldo closing this as we can support loading weights of arbitrary derivative models. Please feel free to reopen!