Document use of Mistral

Question

Document use of Mistral

borisdayma opened this issue 3 months ago · comments

Boris Dayma commented 3 months ago

It looks like you already support Mistral, though maybe missing sliding window attention.

Would be great to:

add a section about it in https://github.com/google/maxtext#supported-open-models
how to do inference (similar to gemma set-up instructions + use of decode.py)
share the converted weights or add a conversion script

Boris Dayma · Answer 1 · Mon Mar 18 2024 22:48:41 GMT+0800 (China Standard Time)

Looks like this is actually available: https://github.com/google/maxtext/blob/main/end_to_end/test_mistral.sh

The only thing I had to do was replace tokenizer.mistral with tokenizer.model (is it a typo or did you rename it in your bucket?).
Also I chose to convert the bfloat16 weights to float32 instead of float16 which I think could bring some imprecision.

Javier de la Rosa · Answer 2 · Tue Mar 19 2024 06:55:19 GMT+0800 (China Standard Time)

Can I ask what kind of TPU are you using for the test, @borisdayma? I do have available a v4-32 that I'd like to use to do continue pre-training on Llama2/Mistral 7B, but other frameworks seem sub-optimal so far to me.

Boris Dayma · Answer 3 · Tue Mar 19 2024 07:07:54 GMT+0800 (China Standard Time)

It should work on a v3-8.
You can also try the decode.py function but for me it worked on the 7b models (gemma or mistral).

Rafi Witten · Answer 4 · Wed Mar 27 2024 06:50:31 GMT+0800 (China Standard Time)

Amazing @borisdayma! We don't actually official support Mistral (we do support Llama and Gemma) but we're thrilled things are working for you!

Boris Dayma · Answer 5 · Wed Mar 27 2024 08:27:02 GMT+0800 (China Standard Time)

Yeah your inference test of mistral is correct. I compared with transformers output and was getting the same.

Boris Dayma · Answer 6 · Tue Apr 30 2024 04:55:17 GMT+0800 (China Standard Time)

I'm closing this issue because Mistral seems to already work well after further testing.