Support using arbitrary derivative models

Question

Support using arbitrary derivative models

ivanbaldo opened this issue 8 months ago · comments

Currently the models need to be specified as llama7b for example, but what if one wants to use codellama/CodeLlama-7b-hf or meta-llama/Llama-2-7b-hf (non chat version), etc.?
A more flexible method should be implemented in the future.

Eric Buehler · Answer 1 · Thu Feb 29 2024 22:29:51 GMT+0800 (China Standard Time)

@ivanbaldo, thank you for this idea. Perhaps specifying models via a model ID could be implemented.

Johannes Hund · Answer 2 · Sat Mar 02 2024 22:15:55 GMT+0800 (China Standard Time)

This might be easier than the idea I had
I was trying to port support for quantized gguf models from this candle example, but am a bit lost bringing it in:
https://github.com/huggingface/candle/blob/main/candle-examples/examples/quantized/main.rs

might be also an issue to know the base llama model there to set parameters correctly - I don't know if gguf has all the infos you need in model metadata

Eric Buehler · Answer 3 · Sat Mar 02 2024 22:30:30 GMT+0800 (China Standard Time)

GGUF would be a great addition! However, I am now working on mistral.rs, the successor to this project: https://github.com/EricLBuehler/mistral.rs

Mistral.rs currently has quantized and normal Mistral models, and may be used with arbitrary derivative models. It provides an openai-compatible server and there is a simple chat example.

Guoqing Bao · Answer 4 · Thu Jul 04 2024 18:06:15 GMT+0800 (China Standard Time)

Currently the models need to be specified as llama7b for example, but what if one wants to use codellama/CodeLlama-7b-hf or meta-llama/Llama-2-7b-hf (non chat version), etc.? A more flexible method should be implemented in the future.

Please also refer to this PR #46 , it can load arbitrary models under the given model architecture.

Eric Buehler · Answer 5 · Mon Jul 08 2024 22:50:50 GMT+0800 (China Standard Time)

@ivanbaldo closing this as we can support loading weights of arbitrary derivative models. Please feel free to reopen!