Mixtral-8x7B support
tobrun opened this issue · comments
https://huggingface.co/docs/transformers/model_doc/mixtral
Traceback (most recent call last):
File "/home/nurbot/miniconda3/envs/lchain/bin/ct2-transformers-converter", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/nurbot/miniconda3/envs/lchain/lib/python3.11/site-packages/ctranslate2/converters/transformers.py", line 2008, in main
converter.convert_from_args(args)
File "/home/nurbot/miniconda3/envs/lchain/lib/python3.11/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args
return self.convert(
^^^^^^^^^^^^^
File "/home/nurbot/miniconda3/envs/lchain/lib/python3.11/site-packages/ctranslate2/converters/converter.py", line 89, in convert
model_spec = self._load()
^^^^^^^^^^^^
File "/home/nurbot/miniconda3/envs/lchain/lib/python3.11/site-packages/ctranslate2/converters/transformers.py", line 107, in _load
raise ValueError(
ValueError: No conversion is registered for the model configuration MixtralConfig (supported configurations are: BartConfig, BertConfig, BloomConfig, CodeGenConfig, DistilBertConfig, FalconConfig, GPT2Config, GPTBigCodeConfig, GPTJConfig, GPTNeoXConfig, LlamaConfig, M2M100Config, MBartConfig, MPTConfig, MT5Config, MarianConfig, MistralConfig, MixFormerSequentialConfig, OPTConfig, PegasusConfig, PhiConfig, RWConfig, T5Config, Wav2Vec2Config, WhisperConfig, XLMRobertaConfig)
likelihood we support this in the short term is very low. it requires 100GB vRAM in FP16/BF16.
We do support it in 4-bit quantized on 2 24GB GPUs in OpenNMT-py.
https://huggingface.co/OpenNMT/mixtral-onmt-awq-gemv
bear in mind the model is 24GB in 4-bits, so you need 2 GPUs or 1 48GB