Provide pruned version for weaker hardware

Question

CommanderTvis opened this issue 2 years ago · comments

It would be really useful to have a pruned version of the model (like Balaboba) to launch on weaker video card setups.

Iaroslav Postovalov · Answer 1 · Mon Mar 20 2023 18:19:52 GMT+0800 (China Standard Time)

Also, quantization even to 4 bits may be possible, like it is successfully done for LLaMa. https://github.com/ggerganov/llama.cpp

Evgeny Blokhin · Answer 2 · Tue Mar 21 2023 00:04:29 GMT+0800 (China Standard Time)

+1 also this distribution technique might be very much applicable here: https://petals.ml