[Question] convert HF tokenizer to maxtext tokenizer?
YannDubs opened this issue · comments
llama_or_mistral_ckpt.py provides the code to convert LLaMA/mistral weights to maxtext ones, is there a script to do the same for the tokenizer? and more generally from any HF tokenizer?
thanks!
for mistral, download their tokenizer from https://github.com/mistralai/mistral-src and there is no conversion needed.
For Mistral tokenizer, I downloaded their model using wget https://models.mistralcdn.com/mistral-7b-v0-1/mistral-7B-v0.1.tar
. After that, should I directly put the extracted mistral-7B-v0.1/tokenizer.model
under maxtext/assets
and everything is all set?
Thank you very much for your time and help!
@LeoXinhaoLee using tokenizer_path="mistral-7B-v0.1/tokenizer.model"
worked for me. Closing as a result!