Mistral support

Question

Mistral support

Nikita-Sherstnev opened this issue 6 months ago · comments

Would it be hard to adapt this code for Mistral? I tried open orca version and set vocab_size in config to 32002. But shapes did not match:

File "/experiments/dev/nsherstnev/gpt-fast/scripts/convert_hf_checkpoint.py", line 61, in permute
    w.view(n_head, 2, config.head_dim // 2, dim)
RuntimeError: shape '[32, 2, 64, 4096]' is invalid for input of size 4194304

Bram Wasti · Answer 1 · Tue Dec 12 2023 01:22:18 GMT+0800 (China Standard Time)

you'll need to change some more configuration params (e.g. n_local_heads should be 8)

I'd copy them from here https://huggingface.co/docs/transformers/main/model_doc/mistral#transformers.MistralConfig

Artem Bolgar · Answer 2 · Fri Mar 01 2024 07:53:39 GMT+0800 (China Standard Time)

Done in #116 The issue can be closed now.