OpenNMT / CTranslate2

Fast inference engine for Transformer models

Home Page:https://opennmt.net/CTranslate2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mixtral-8x7B support

tobrun opened this issue · comments

https://huggingface.co/docs/transformers/model_doc/mixtral

Traceback (most recent call last):
  File "/home/nurbot/miniconda3/envs/lchain/bin/ct2-transformers-converter", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/nurbot/miniconda3/envs/lchain/lib/python3.11/site-packages/ctranslate2/converters/transformers.py", line 2008, in main
    converter.convert_from_args(args)
  File "/home/nurbot/miniconda3/envs/lchain/lib/python3.11/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args
    return self.convert(
           ^^^^^^^^^^^^^
  File "/home/nurbot/miniconda3/envs/lchain/lib/python3.11/site-packages/ctranslate2/converters/converter.py", line 89, in convert
    model_spec = self._load()
                 ^^^^^^^^^^^^
  File "/home/nurbot/miniconda3/envs/lchain/lib/python3.11/site-packages/ctranslate2/converters/transformers.py", line 107, in _load
    raise ValueError(
ValueError: No conversion is registered for the model configuration MixtralConfig (supported configurations are: BartConfig, BertConfig, BloomConfig, CodeGenConfig, DistilBertConfig, FalconConfig, GPT2Config, GPTBigCodeConfig, GPTJConfig, GPTNeoXConfig, LlamaConfig, M2M100Config, MBartConfig, MPTConfig, MT5Config, MarianConfig, MistralConfig, MixFormerSequentialConfig, OPTConfig, PegasusConfig, PhiConfig, RWConfig, T5Config, Wav2Vec2Config, WhisperConfig, XLMRobertaConfig)

likelihood we support this in the short term is very low. it requires 100GB vRAM in FP16/BF16.
We do support it in 4-bit quantized on 2 24GB GPUs in OpenNMT-py.
https://huggingface.co/OpenNMT/mixtral-onmt-awq-gemv
bear in mind the model is 24GB in 4-bits, so you need 2 GPUs or 1 48GB